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This  study  is  concerned  with  the  development  of  algorithms  for 
minimizing  a nonlinear  objective  function  subject  to  linear  constraints. 
Special  attention  is  given  to  models  that  have  network  constraints  with 
possible  bounds  on  the  variables  because  they  have  many  applications  in 
areas  such  as  transportation  science,  engineering  and  economics. 

The  first  approach  studied  is  a restricted  version  of  the 
simplicial  decomposition  algorithm.  This  method  allows  the  user  to 
treat  the  maximum  size  of  the  generated  simplices  as  a parameter.  When 
the  parameter  is  at  its  minimum  value,  the  method  reduces  to  the  Frank- 
Wolfe  algorithm;  when  at  its  maximum,  it  is  the  original  simplicial 
decomposition  of  von  Hohenbalken  and  Holloway. 

The  second  approach  is  concerned  with  the  application  of  conjugate 
gradient  based  methods  to  solve  dual  formulations  of  problems  where  the 


v 


objective  function  is  separable,  strictly  convex  and  quadratic.  These 
methods  exploit  both  the  sparsity  and  structure  of  the  constraint 
matrix. 

Conditions  for  finite  and  superl inear  convergence  of  these 
techniques  are  also  discussed,  and  computational  results  on  a variety  of 
large  test  problems  are  presented. 
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CHAPTER  ONE 


INTRODUCTION  AND  OVERVIEW 

1.1  Introduction 

This  dissertation  is  concerned  with  the  development  of  efficient 
algorithms  for  solving  large-scale  nonlinear  optimization  problems.  The 
primary  focus  is  on  models  with  a pseudoconvex  objective  function  and 
network  constraints.  These  models  have  several  important  applications 
such  as  the  standard  traffic  assignment  problem  [Beckmann  1951], 
electrical  network  problems  [Cooper  and  Kennington  1977],  water 
distribution  models  [Collins  et  al . 1978],  economic  and  financial  models 
[Bachem  and  Korte  1977,  Mulvey  1985],  computer  and  communication  network 
models  [Cantor  and  Gerla  1974],  and  many  others.  However,  these 
algorithms  are  also  applicable  to  problems  with  general  linear 
constraints  and,  therefore,  this  research  can  be  considered  as 
applicable  to  a broad  class  of  large-scale  models. 

The  techniques  considered  in  this  study  are  divided  in  two  classes: 
simplicial  decomposition  and  dual  methods. 

Simplicial  decomposition  is  a simple  and  direct  method  for  dealing 
with  large-scale  pseudoconvex  optimization  problems,  especially  when  the 
constraints  are  linear.  The  term  "simplicial  decomposition"  is  due  to 
von  Hohenbalken  [1975,  1977],  but  the  essential  idea  of  this 
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decomposition  has  also  been  called  "inner  linearization/restriction"  by 
Geoffrion  [1970]  and  "the  extended  Frank-Wolfe  method"  by  Holloway 
[1974].  The  decomposition  iterates  by  alternatively  solving  a linear 
programming  subproblem  that  generates  extreme  points  of  the  feasible 
region  and  a nonlinear  master  problem  on  the  convex  hull  of  a subset  of 
previously  generated  extreme  points  (a  simplex). 

Thus,  simplicial  decomposition  may  be  viewed  as  a form  of  modular 
nonlinear  programming,  provided  that  one  has  an  effective  computer  code 
for  solving  the  nonlinear  master  problem,  as  well  as  access  to  a code 
which  can  take  advantage  of  the  linearity  of  the  subproblem,  including 
any  special  structure  which  might  be  present.  Recent  developments  in 
computational  mathematical  programming  have  provided  such  codes  for  many 
important  applications,  especially  those  which  are  modeled  as  network 
problems.  For  such  models  the  linear  subproblem  can  be  solved  by  an 
appropiate  network  code  such  as  GNET  [Bradley  et  al.  1977],  RNET 
[Grigoriadis  and  Hsu  1979],  NETFLO  [Kennington  and  Helgason  1980],  or, 
in  some  instances,  by  a shortest  path  code  [Syslo  et  al.  1983].  The 
master  problem  may  be  solved  by  a nonlinear  code  such  as  NPSOL  [Gill  et 
al.  1983],  NLPQL  [Schittkowski  1984],  or  the  projected  Newton  method 
[Bertsekas  1982],  all  of  which  have  the  important  feature  of  super! inear 
convergence.  Without  this  feature  the  overall  procedure  might  prove 
ineffective  because  too  much  time  would  be  consumed  for  each  master 
iteration. 

The  solution  of  a large-scale  mathematical  program  can  often  be 
facilitated  by  exploiting  its  special  structure.  For  instance,  it  is 
well-known  that  the  dual  of  a strictly  convex  quadratic  program  subject 
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to  linear  equality  constraints  is  an  unconstrained  quadratic  program 
with  as  many  variables  as  primal  constraints  [Wolfe  1961].  If,  in 
addition,  the  primal  problem  is  separable  and  has  network  constraints, 
the  dual  is  very  easy  to  find  and  does  not  require  the  storage  of  any 
matrix  since  the  Hessian  can  be  generated  from  node-arc  network 
information.  When  the  constraints  are  linear  inequalities  the  dual  is 
also  a quadratic  program  but  subject  to  simple  nonnegativity  constraints 
on  the  variables.  For  the  special  case  of  network  problems,  inequality 
constraints  appear  when  the  flow  on  the  arcs  is  subject  to  upper  and 
lower  bounds. 

The  second  part  of  this  study  discusses  conjugate  gradient  based 
methods  for  solving  dual  formulations  of  separable  strictly  convex 
quadratic  network  problems.  These  methods  are  especially  useful  for 
large  problems  because  they  have  minimum  memory  requirements,  matrix 
operations  can  be  simplified,  and  are  finitely  convergent  under  exact 
arithmetic. 

1.2  Overview 

Chapter  Two  gives  a survey  of  the  most  well-known  solution 
approaches  for  solving  large-scale  convex  or  pseudoconvex  optimization 
problems  with  linear  constraints  that  have  been  succesfully  applied  to 
networks.  This  survey  includes  the  Frank-Wolfe  method  [Frank  and  Wolfe 
1956],  the  convex  simplex  method  [Zangwill  1967],  the  reduced  gradient 
method  [Wolfe  1967],  the  piecewise-1 inear  approximation  method  [Charnes 
and  Lemke  1954],  the  primal  truncated  Newton  method  [Dembo  1983],  and 
the  relaxation  method  [Rockafellar  1984]. 
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Chapter  Three  presents  a restricted  version  of  the  simplicial 
decomposition  algorithm  in  which  the  user  can  control  the  size  of  the 
master  problem  by  a parameter,  r,  which  limits  the  number  of  points 
defining  the  current  simplex  at  each  iteration.  When  r=l,  the  method  is 
the  familiar  algorithm  of  Frank-Wolfe  and,  when  r is  greater  than  the 
number  of  variables,  the  method  is  the  original  simplicial 
decomposition. 

Furthermore,  we  will  obtain  conditions  under  which  the  rules  for 
dropping  and  retaining  generated  points  guarantee  that  the  method  will 
terminate  after  a finite  number  of  major  iterations.  Under  these 
conditions,  the  overall  convergence  rate  will  be  that  of  the  algorithm 
selected  for  the  master  problem  (e.g.,  superl inear) . We  also  discuss  a 
simpler  formulation  of  the  master  problem  and  apply  to  it  the  projected 
Newton  method  of  Bertsekas  [1982]  and  a quasi-Newton  version  based  on 
the  BFGS  formula  [Dennis  and  More  1977],  both  of  which  have  the  property 
of  superl inear  convergence. 

An  extensive  computational  study  of  the  basic  restricted  simplicial 
decomposition  algorithm  is  also  presented.  The  test  problems  consist  of 
the  linearly  constrained  Colville  problems  [Colville  1968],  several 
traffic  assignment  models  [Florian  et  al.  1983]  which  have  become 
standard  test  problems,  some  simple,  but  large,  electrical  networks 
which  where  obtained  from  RCA  physicists  [Balberg  et  al.  1983],  and  the 
Dallas  water  distribution  problem  [Collins  et  al . 1978].  The 
experiments  support  the  conclusion  that  when  r can  be  chosen 
sufficiently  large  the  decomposition  rapidly  achieves  a solution  to  high 
accuracy.  On  problems  where  this  is  not  possible,  such  as  electrical 
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network  problems,  the  experiments  indicate  that  the  approach  will 
produce  an  objective  value  within  1 or  2 percent  of  the  optimal  in  a 
relatively  small  amount  of  computer  time. 

The  first  part  of  Chapter  Four  specializes  the  conjugate  gradient 
method  of  Hestenes  and  Stiefel  [1952]  to  solve  separable  strictly  convex 
quadratic  network  problems  with  undirected  arcs.  The  network  data 
structure  is  used  to  reduce  memory  requirements  and  computational  effort 
in  matrix  operations.  Computational  experiments  with  electrical 
networks  [Balberg  et  al . 1983]  show  that  this  approach  is  much  superior 
to  the  primal  method  of  Chapter  Three. 

At  the  end  of  Chapter  Four,  we  deal  with  a dual  ascent  method  for 
solving  a Lagrangian  relaxation  of  separable  strictly  convex  quadratic 
networks  with  upper  and  lower  bounds  on  the  arcs.  The  ascent  directions 
of  this  procedure  are  generated  by  conjugate  gradient  formulas 
(i.e.,  Fletcher  and  Reeves  [1964],  Polak  and  Ribiere  [1969]). 


CHAPTER  TWO 


PROBLEM  STATEMENT  AND  PAST  WORK 

2.1  Problem  Statement 

The  main  concern  of  this  dissertation  is  with  the  solution  of  the 
following  nonlinear  programming  problem 

min  f(x) 

s.t.  A x = b (2.1.1) 

1 £ x s u 

where  f (•)  is  a continuously  differentiable  pseudoconvex  function, 

1 € (R  U {±°°})  , u € (R  U { -*} ) n and  b € Rm  are  constant  vectors,  x € Rn 
is  the  vector  of  decision  variables,  and  A is  an  mxn  matrix.  In  many 
applications  A will  be  the  node-arc  incidence  matrix  of  a graph  G(V,  E), 
where  V is  the  set  of  nodes  with  cardinality  m and  E is  the  set  of  arcs 
with  cardinality  n. 

2.2  Past  Work 

During  the  past  decade  a significant  amount  of  research  has  been 
done  in  the  development  of  new  methods  or  extensions  of  existing 
techniques  for  solving  large-scale  nonlinear  optimization  problems  of 
the  form  (2.1.1),  with  special  emphasis  on  problems  with  network 
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constraints.  Collins  et  al.  [1978],  Hanscom  et  al.  [1978],  Jacoby  and 
Kowalik  [1980],  Dembo  and  Klincewicz  [1981],  Dembo  [1983],  Klincewicz 
[1983],  and  Kamesam  and  Meyer  [1984]  have  studied  the  determination  of 
steady-state  flows  and  pressures  in  water  distribution  networks.  Bachem 
and  Korte  [1977],  Cottle  and  Duval  [1982]  and  Kamesam  and  Meyer  [1984] 
estimate  data  elements  in  input-output  tables  by  solving  quadratic 
transportation  problems.  Bertsekas  [1976],  Rosenthal  [1981]  and  Kamesam 
and  Meyer  [1984]  have  studied  the  determination  of  optimal  schedules  in 
reservoir  management  problems.  Determining  the  user  equilibrium 
conditions  in  traffic  networks  has  been  studied  by  Dafermos  and  Sparrow 
[1969],  Nguyen  [1974,  1976],  LeBlanc  et  al . [1975],  Dembo  and  Tulowitzki 
[1983b],  Lawphongpanich  and  Hearn  [1983],  Guelat  [1983]  and  many  others. 
These  latter  problems  have  multiple  commodities. 

Below,  we  summarize  some  of'  the  most  relevant  solution  approaches 
that  have  been  used  for  (2.1.1). 

2.2.1  Frank-Wolfe  Method 

The  Frank-Wolfe  (FW)  method  [Frank  and  Wolfe  1956]  is  the  simplest 
and  most  straighforward  method  for  solving  pseudoconvex  programming 
problems  with  linear  constraints.  Suppose  that  xk  is  a feasible 
point  at  iteration  k,  and  let  yk  solve  the  following  linearized 
subproblem 

min  6f (xk)y 
s.t.  A y = b 
1 S y s u 
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where  6f(xk)  is  the  gradient  of  f (•)  at  xk.  Let  ak  be  a solution  to  the 

problem  min  {f(axk  + (1  - a)yk)  : 0 s a <;  1],  then  the  new  iterate 

becomes  xk+1  = akxk  + (1  - ak)yk.  If  f(.)  is  convex,  f(xk)  + 
k k k 

5f(x  ) (y  - x ) is  a lower  bound  for  the  optimal  function  value,  and  the 
algorithm  terminates  when  this  bound  is  within  a certain  tolerance. 

The  FW  algorithm  has  appeal,  especially  for  traffic  assignment 
problems,  since  the  subproblem  becomes  a set  of  independent  shortest 
path  problems. 

Although  the  FW  method  progresses  rapidly  in  the  early  iterations, 
it  is  known  to  converge  rather  slowly  as  the  optimal  solution  is 
approached.  Several  authors,  such  as  Collins  et  al . [1978],  Florian  et 
al . [1983]  and  LeBlanc  et  al.  [1985],  have  improved  its  convergence  by 
introducing  a Partan  direction  and  line  search  at  each  iteration. 

2.2.2  Convex  Simplex  Method 

The  convex  simplex  (CS)  method  was  introduced  by  Zangwill  [1967] 
and  may  be  viewed  as  a generalization  of  the  linear  simplex  method.  Let 

k 

x denote  a feasible  point  at  iteration  k with  1 s x s u replaced  by 
0 s x s u'.  The  linear  equality  constraints  are  partitioned  as 

= b 

where  xg  and  x^  denote  the  basic  and  nonbasic  variables  with  values  of 
k k \s 

x8  anc*  XN’  resPectively.  We  also  partition  the  gradient  6f(xK)  into 
basic  and  nonbasic  components  such  that  6f(xk)  = [6fg(xk),  6fN(xk)]. 


A x = [B,  N] 
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Let 

rN  = 6fN(xl<)  ‘ NTB-15fBCxk). 

Then  a nonbasic  variable  is  a candidate  for  value  change  if 

either  r..  < 0 and  x ^ < ul  or  r^.  > 0 and  x^  > 0.  If  there  is  no 

• k v 

candidate,  then  x is  a solution.  Otherwise,  let  xK  denote  the 

J 

candidate.  The  problem  then  becomes  determining  a step  vector 
k k k 

<$x  - (6Xg,  <5 x^j ) which  reduces  the  value  of  f(x)  while  forcing  x^  + 5x^ 

to  satisfy  the  constraints.  The  only  nonbasic  variable  that  changes  is 
k k 

xj  and  <SXg  will  be  changed  so  that  the  constraints  remain  satisfied.  If 
^Xj  1S  zer,°  or  the  maximum  allowable  change,  a pivot  operation  is 
performed  where  the  leaving  variable  is  any  basic  variable  at  zero  or 
its  upper  bound.  Otherwise,  B ^ remains  unchanged. 

Collins  et  al . [1978]  applied  this  method  to  network  problems 
exploiting  the  underlying  network  structure  using  the  same  basic  design 
as  the  linear  network  code  NETFLO  [Kennington  and  Helgason  1980]. 

2.2.3  Reduced  Gradient  Method 

The  reduced  gradient  (RG)  method  was  first  developed  by  Wolfe 
[1967]  for  linearly  constrained  problems  and  later  generalized  by  Abadie 
and  Carpenter  [1969]  to  handle  nonlinear  constraints. 

Here,  we  describe  a version  of  the  RG  method  for  large-scale 
linearly  constrained  problems  by  Murtaugh  and  Saunders  [1978]  that  has 
been  the  basis  for  more  specialized  methods  for  network  problems.  The 
linear  equality  constraints  are  partitioned  as 
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A x = [B,  S,  N] 


= b 


where  the  basic  variables,  Xg,  are  used  to  satisfy  the  equality 
constraints,  the  superbasic  variables,  Xg,  are  allowed  to  vary  to 
minimize  f(x),  and  the  nonbasic  variables,  x^,  are  fixed  at  their 
bounds.  B is  a square  nonsingular  matrix.  At  each  step,  the  problem 
then  becomes  determining  a step  vector  6x  = (<$Xg,  6Xg,  <5xN)  which 
reduces  the  value  of  f(x)  while  forcing  x + 5x  to  satisfy  the 
constraints.  Since  x^  is  fixed,  5x^  = 0,  and  allowing  6x<-  to  be  chosen 
freely,  5xg  is  determined  by 

6xg  = -B-1S  6xs. 


Considering  the  unconstrained  problem  of  minimizing  f(x)  as  a 
function  of  the  current  set  of  superbasic  variables,  xs,  the  reduced 
gradient  h of  f (x ) with  respect  to  x$  is  given  by 

h = 6 f $ ( x ) - STtt 
it  = B'1  5fg  (x ) 


where  6fg(x),  6fg(x)  and  6 f ^ ( x ) are  the  components  of  6f(x) 
corresponding  to  xg,  xs  and  xN,  respectively.  Theoretically,  the 
algorithm  continues  until  j|h||  = 0,  in  which  case  5xs  = 0,  and  x$  is  a 
solution  in  the  manifold  defined  by  Xg  and  x^.  The  Lagrangian 
multipliers  x defined  by 
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x = 6fN(x)  - NTtt 

are  computed,  and  if  t.  z 0 for  each  xi  in  xN  at  its  lower  bound  and 
Ti  * 0 for  each  xi  XN  at  its  upper  bound,  x is  a solution  of  (2.1.1). 
If  not,  a set  of  candidates  can  be  chosen  from  x^  to  become  superbasic, 
and  a new  unconstrained  problem  is  solved. 

Murtaugh  and  Saunders  use  both  quasi-Newton  and  conjugate  gradient 
methods  for  minimization  on  the  manifold. 

Several  authors  have  considered  variants  of  the  RG  algorithm  for 
network  problems:  Dembo  and  Klincewicz  [1981]  make  explicit  use  of 
second-order  information  and  exploit  network  programming  data  structure. 
Their  algorithm  includes  a heuristic  for  conditioning  the  basis,  a 
maximal  basis  procedure,  and  uses  a scaled  search  direction.  Beck  et 
al.  [1983]  use  a broad  class  of  conjugate  gradient  methods  to  generate 
the  search  direction  and  make  the  algorithm  useful  for  nonseparable 
objectives. 

2.2.4  Piecewise-1 inear  Approximation  Method 

The  piecewise-linear  approximation  (PLA)  method  for  separable 

convex  programs  was  first  introduced  by  Charnes  and  Lemke  [1954]. 

. n 

Assume  that  the  objective  function  of  (2.1.1)  is  f(x)  = z f-(x-).  This 

i=l  1 1 

method  simply  chooses  a set  of  grid  points  in  the  interval  ( 1 ^ , u_.)f  for 
i - i > •••>  n,  and  replaces  the  functions  f^.(*)  by  piecewise-linear 
approximations  determined  by  the  function  values  at  the  grid  points. 

Then  the  approximating  problem  becomes  a linear  program.  Suppose  that 
Ki  linear  segments  are  to  be  used  in  the  approximation  of  f ^ (•) . Then 
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the  interval  (1 i , u^  is  partitioned  into  Ki  segments  each  of  length 

u ik * ^or  ^ ~ ^ 5 suc^  that  - 1^ . Let  denote  the 

end  points  of  the  K..  segments.  That  is 

k 


K, 


Let  x i = li  + z\ik  where  denotes  the  value  in  segment  k of 

(V  “l>  Wlth  C0St  0f  S-k  = ^i^Vi,k-Hl>  ' fi^ik))/Uik* 

(2.1.1)  can  be  approximated  by 

n n K. 

min  E Ml,)  ♦ z z'c,kx1k 

1=1  1=1  |<  = 1 1K  1K 


s.  t.  A x = b 


0 5 xik  5 uik 


k = 1, 

i = 1, 


, K< 


n. 


Given  that  | <S f ^ ( x ) JJ  s a^,  for  i = 1,  ...,  n,  it  is  possible  to 
obtain  error  bounds  on  the  optimal  function  value  depending  on  the 
constants  a.,  the  length  of  the  maximum  segment  used  in  approximating 
the  functions  f.. (•)  and  the  dual  variables  obtained  by  solving  the 
linearized  subproblem  (see,  e.g.,  Bazaraa  and  Shetty  [1979]). 

Several  authors  have  used  modifications  of  this  technique  for 
network  problems:  Collins  et  al . [1978]  suggested  a heuristic  to  find 
the  unit  costs  c^k  based  on  the  idea  of  least  squares.  Kamesam  and 
Meyer  [1984]  discuss  two  algorithms.  The  first  uses  a local  piecewise- 
1 inear  approximation  with  a fixed  number  of  segments  ( two,  four  or  six 
segments  per  variable),  and  the  second  uses  an  implicit  global 
approximation  to  the  objective  function  that  is  modified  at  each 
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iteration.  This  second  uses  at  most  two  piecewise-1 inear  segments  per 
variable  at  any  time. 

2.2.5  Primal  Truncated  Newton  Method 

The  primal  truncated  Newton  (PTN)  method  was  introduced  and 
specialized  to  nonlinear  networks  by  Dembo  [1983].  The  PTN  algorithm  is 
a feasible  direction  method  that  fits  into  the  convergent  framework  for 
large-scale  optimization  of  Dembo  and  Sahi  [1983].  This  framework 
involves  two  types  of  feasible  directions: 

i)  Restricted  directions:  feasible  descent  directions  restricted 
to  lie  in  a subspace  containing  the  current  active  set. 

ii)  Relaxing  directions:  feasible  descent  directions  along  which 
one  or  more  constraints  may  be  relaxed. 

The  relaxing  steps  play  an  important  role  in  the  PTN  algorithm. 

This  role  is  to  facilitate  rapid  identification  of  the  optimal  active 
set  of  constraints.  For  large  problems  it  is  crucial  for  the  relaxing 
step  to  be  able  to  make  radical  changes  to  the  active  set.  One  way  to 
do  so  economically  is  to  move  in  the  direction  of  the  negative  gradient 
projected  onto  the  box  constrained  problem  (reduced  problem  with  upper 
and  lower  bounds  only). 

For  the  restricted  directions,  PTN  uses  a primal  feasible  truncated 
Newton  direction.  Considering  the  same  partition  of  the  set  of  equality 
constraints  of  Subsection  2.2.3,  define  Z by 
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such  that  A Z = 0. 

The  restricted  direction  is  then  given  by  p = Zps,  and  the  reduced 
gradient  and  Hessian  are  ZT6f(x)  and  ZTH(x)Z,  respectively,  where  H(x) 
is  the  Hessian  of  f(»)  at  x.  The  descent  direction  ps  is  computed  by 
solving  the  reduced  Newton  equations 

(ZTH(x)Z)ps  = -ZT5f(x) 

inexactly  using  a linear  conjugate  gradient  method.  A forcing  sequence 
is  employed  to  reduce  the  inexactness  as  the  method  iterates.  Accuracy 
is  defined  according  to  the  relative  residual  ||r||/||ZT5f (x) ||  in  which 
r = (ZTH(x)Z)p$  + ZT5f(x). 

The  global  convergence  of  this  method  is  based  on  the  fact  that  the 
dropping  of  constraints  is  done  according  to  a "forcing  sequence 
strategy"  and  that  the  relaxing  steps  are  "sufficiently  long." 

2.2.6  Relaxation  Method 

Relaxation  (R)  techniques  draw  from  the  field  of  duality  theory  and 
monotropic  programming  (see,  e.g.,  Rockafellar  [1984]).  Apparently, 
Birkhoff  and  Diaz  [1956]  were  the  first  to  use  relaxation  techniques  for 
convex  networks.  More  recently,  they  have  been  used  by  Bertsekas  and  El 
Baz  [1984]  and  Zenios  and  Mulvey  [1986]. 


Z = 


-B_1S 
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Consider  the  problem 


min  z f.-fx..) 

(i,j)€E  1J 

s*t.  E x..  - z x..  = b-  for  all  i € V 
(1.J)€E  3 (j,i)6E  J1  1 


'ij  S Xij  S Uij 


for  all  ( i , j ) € E 


where 


W = i 


for  Tj  s xij  5 uij 


otherwise 


and  f-jj(#)  is  strictly  convex. 

The  dual  is  the  following  unconstrained  problem 


min  z f.,(ir.  - tt.) 

(i,j)€E  1J  1 J 

* 

where  f ^ j ( • ) is  the  convex  conjugate  function  of  f defined  by 

fij(tij)  =rP  {tiJXiJ  ' fij(Xij)} 

xij 

and  it  is  defined  as  the  price  vector.  The  ith  price  is  really  a 
Lagrangian  multiplier  associated  with  the  ith  conservation  of  flow 
constraint. 

Rockafellar  [1970]  gives  necessary  and  sufficient  conditions  for 

optimality  of  the  pair  (x,  ir),  i.e.,  a feasible  flow  vector  x and  a dual 

vector  tt  are  optimal  if  and  only  if  for  all  (i,j)  € E,  tt.  - tt  . is  a 

0 

subgradient  of  f..(x..)  or,  equivalently,  x..  = 5f*.(ir.  - it.). 

J J IJ  'J"1J 
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Since  the  dual  problem  is  unconstrained  and  differentiable,  the 
natural  solution  approaches  are  based  on  iterative  descent  procedures 
such  as  the  Gauss-Siedel  scheme  for  solving  systems  of  equations. 
Optimization  is  achieved  for  one  node  at  every  iteration  given 
information  at  adjacent  nodes. 


CHAPTER  THREE 


RESTRICTED  SIMPLICIAL  DECOMPOSITION 

3.1  Background 

The  method  described  in  this  chapter  is  based  on  the  standard 
simplicial  decomposition  (SD)  algorithm  first  described  by  von 
Hohenbalken  [1975,  1977]  and  Holloway  [1974]  for  solving  nonlinear 
programming  problems  with  a pseudoconvex  objective  function  and  linear 
constraints.  Conceptually,  this  method  iterates  by  alternately  solving 
a linear  programming  subproblem  and  a nonlinear  master  problem  with 
simple  convexity  constraints.  The  subproblem  generates  extreme  points 
of  the  feasible  region,  and  the  master  problem  finds  the  optimum  of  the 
objective  function  within  the  convex  hull  of  a subset  of  previously 
generated  points  (a  simplex). 

The  restricted  version  of  the  simplicial  decomposition  (RSD) 
algorithm  was  originally  suggested  by  Lawphongpanich  and  Hearn  [1983] 
with  application  to  the  traffic  assignment  problem.  They  showed  that  by 
retaining  some  old  iterates  together  with  some  extreme  points,  the  size 
of  the  master  problem  could  be  controlled  by  the  user  with  a parameter 
r.  Computational  experiments  indicated  that  the  progress  of  the 
algorithm  improved  as  r increased.  The  master  problem  was  solved 
inexactly  by  a projection  method  on  a quadratic  approximation  of  the 
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objective  function  using  a diagonal  matrix.  This  method  has  a linear 
rate  of  convergence. 

The  RSD  algorithm  described  in  this  chapter  has  two  important  new 
features: 

i)  It  will  be  shown  that  when  the  optimal  solution  is  unique  and  r 
is  larger  than  the  dimension  of  the  optimal  face  (see  Wolfe  [1970])  of 
the  feasible  region,  RSD  converges  in  a finite  number  of  major  cycles. 

ii)  The  master  problem  is  solved  to  optimality  using  the  projected 
Newton  method  of  Bertsekas  [1982]  and  a quasi-Newton  version,  both  of 
which  have  a superlinear  rate  of  convergence. 

3.2  Prel iminaries 

i 

As  stated  in  Section  2.1,  we  are  concerned  with  the  following 
problem 


min  f (x) 

s.t.  A x = b (3.2.1) 

1 s x s u 

where  1 € (R  U {±»})n,  u € (R  U {±»})n,  x € Rn,  b 6 Rm  and  A is  an  mxn 
matrix. 

Consider  the  following  assumptions: 

Assumption  3.2.1:  f(»)  is  a continuously  differentiable  pseudoconvex 
function. 

Assumption  3.2.2:  The  feasible  region  of  (3.2.1),  denoted  as  S,  is 


bounded. 
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Under  Assumption  3.2.2  any  element  of  S may  be  expressed  as  a 
convex  combination  of  its  extreme  points,  of  which  there  are  only  a 
finite  number.  Thus,  (3.2.1)  can  be  rewritten  in  terms  of  the  extreme 
points  as 

N 

min  f(  z a. a.) 
i=l  1 1 

N 

s-t.  S o.  = 1 (3.2.2) 

i=l  1 

a-  * 0 i=  1,  . . . , N 

where  N is  the  total  number  of  extreme  points  and  a^  corresponds  to  the 
ith  extreme  point  of  S. 

The  following  definition  of  a simplex  from  Rockafellar  [1970]  is 
important  for  the  understanding  of  the  algorithm. 

Definition:  Let  {zQ,  z^,  ...,  zp]  be  p+1  distinct  points  in  Rn  with 
p s n,  where  the  vectors  zx  - zQ,  z2  - zQ,  . . . , zp  - zQ  are  linearly 
independent.  Then,  the  set  C = H(zQ,  zp  ...,  z ),  the  convex  hull  of 
fz0»  zi>  •••»  zp}>  is  called  a p-simplex  in  Rn.  In  addition,  since  C is 
always  contained  in  a manifold  of  dimension  p,  a p-simplex  is  said  to 
have  dimension  p,  written  as  dim  C = p. 

Caratheodory 1 s Theorem  (see,  e.g.,  Bazaraa  and  Shetty  [1979]) 
assures  us  that  not  only  can  every  x 6 S be  written  as  a convex 
combination  of  at  most  n+1  extreme  points,  but  that  x lies  in  the 
relative  interior  of  at  least  one  p-simplex  defined  by  p+1  extreme 
points  with  p S n. 
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Now,  after  the  following  notation,  we  can  state  the  original  SD 
algorithm. 

Notation: 

i)  Ws  = the  set  of  retained  extreme  points. 

ii)  Wx  = a set  which  may  be  empty  or  contains  one  of  the 

iterates. 

iii)  W = W U W . 

iv)  J W | = the  cardinality  of  W. 

v)  H(W)  = the  convex  hull  of  W,  i.e., 

H(W)  = {x  : x = Z a^a^ , E a.  = 1,  a.  i 0 and  z.  € W}. 

vi)  xy  = the  inner  product  of  the  vectors  x and  y. 

vii)  ||x||  = the  euclidean  norm  of  the  vector  x. 

SD  Algorithm 

Step  0:  Let  x°  be  a feasible  point  of  S.  Set  w|j=0  and  k=0. 

Step  1:  (Subproblem) 

Let  yk  € arg  min  {5f(xk)y  : y € S}. 

If  6f(x  ) (y  - x ) > 0,  xk  is  a solution  and  terminate. 

Otherwise,  set  Wk+1=Wk  U {yk},  and  go  to  2. 

Step  2:  (Master  problem) 

Let  xk+1  6 arg  min  { f (x ) : x 6 H(Wk+1)}. 

|<+1 

Purge  W$  of  all  the  elements  with  zero  weight  in  the 
• k+l 

expression  of  x as  a convex  combination  of  the  elements 
of  Wk  Set  k=k+l  and  go  to  1. 
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3 . 3 Restricted  Simplicial  Decomposition  and  Global  Convergence 

In  this  section  the  basic  version  of  RSD  is  stated  and  its 
convergence  is  proved.  This  basic  algorithm  differs  from  the  one  above 
in  two  respects: 

i)  The  master  problem  feasible  region  here  may  be  defined  by  a 
prior  iterate  as  well  as  generated  extreme  points. 

ii)  When  the  number  of  retained  extreme  points  equals  a parameter 
r,  the  incoming  extreme  point  replaces  the  extreme  point  with  minimum 
weight  in  the  expression  of  the  current  iterate  as  a convex  combination 
of  the  retained  extreme  points  and  a prior  iterate.  This  minimum  weight 
"dropping  rule"  is  given  for  definiteness  and  for  the  finiteness  proof 
of  the  next  section. 


RSD  Algorithm 

Step  0:  Let  x°  be  a feasible  point  of  S.  Set  w|?=0,  wj={x0}  and 
k=0. 

Step  1:  (Subproblem) 

Let  yk  6 arg  min  {<5f(xk)y  : y € S}. 

If  5f(x  )(y  - xk)  £ 0,  xk  is  a solution  and  terminate. 

Otherwise, 

1 ) If  IWjI  < r,  set  Wk+1=Wk  U {yk}  and  Wk+1=Wk. 

° b b XX 

I 

ii)  If  |W$|  = r,  replace  the  element  of  W*  with  the  minimal 
weight  in  the  expression  of  xk  as  a convex  combination 
of  Wk  with  yk  to  obtain  Wk+1  and  let  wk+1={xk}. 

Set  Wk  1=Wk  *11  Wk+1  and  go  to  2. 
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Step  2:  (Master  problem) 

Let  xk+1  6 arg  min  { f (x ) : x € H(Wk+1)}. 
k+l  k+1 

Purge  W$  and  Wx  of  all  the  elements  with  zero  weight  in 

the  expression  of  x as  a convex  combination  of  the 
elements  of  Wk+1.  Set  k=k+l  and  go  to  1. 

As  constructed,  the  feasible  region  for  the  master  problem, 

k + l i 

H(W  ),  always  contains  the  current  iterate,  x , and  the  incoming 
extreme  point,  yk.  When  xk  is  not  a solution,  yk  - xk  is  a descent 
direction,  thereby  ensuring  a decrease  in  the  objective  function  value 

k+l 

at  the  new  iterate,  x . The  convergence  proof  then  follows  from  the 
fact  that  x solves  the  master  problem.  The  argument  for  the  proof 
below  is  similar  to  that  of  Holloway  [1974]  and  Lawphongpanich  and  Hearn 
[1983]. 

Lemma  3.3.1:  6f(x  )(y  - x ) £ 0 for  all  y € S if  and  only  if  x is  an 
optimal  solution  to  (3.2.1). 

£roof:  This  is  a standard  nonlinear  programming  result. § 

Lemma  3.3.2:  If  xk  is  not  optimal  to  (3.2.1),  then  f(xk+1)  < f(xk). 

k 

Proof:  As  noted  earlier,  x is  feasible  to  the  master  problem.  Since 

k + l 

x solves  the  master  problem 
f(xM)  5 f(xk). 

k+i  k 

If  f(x  ) = f(x  ),  then  x is  also  a minimum  and 
6f(xk)(y  - xk)  * 0 for  all  y € H(Wk+1) 
but  this  is  a contradiction  since  yk  € Wk+1.| 
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Lemma  3.3.3:  Let  {x  } be  the  sequence  generated  by  RSD.  Then,  there 

k 

cannot  be  a subsequence  {x  with  the  following  three  properties: 

i)  xk  — > x“,  k 6 K 

ii)  yk  _>  y°° t k € K 

iii)  5f(x")(y“  - x")  < 0. 

Proof:  (By  contradiction)  Assume  that  such  a subsequence,  K,  exists, 
i.e.,  there  exists  a r > 0 such  that 

6f(x  )(y  - X ) < -r. 

Since  f(«)  is  continuously  differentiable,  there  exists  a it  € K 
sufficiently  large  so  that  for  any  k > tt 

6f(xk)(yk  - xk)  < -r/2 

and  a $ > 0 such  that  for  0 s t s $ 

[5f(xk  + t(yk  - xk) ) ] (yk  - xk)  < -r/4 
• k k 

By  construction,  both  y and  x are  feasible  to  the  master  problem 

at  iteration  k.  Hence,  there  exists  t € (0,  1)  such  that 
k ^ k k 

X + t (y  - x ) is  feasible  to  the  master  problem.  By  the  optimality 
of  xk+1 

f(xk*1)  s f(xk  * t*(yk  - xk)) 
and  by  Taylor's  expansion 

f(xk+1)  S f(xk)  + Sf(xk  + at*(yk  - xk))t*(yk  - xk) 
where  0 S a S 1.  Since  0 s at  s $ 
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f(xk+1)  s f(xk)  - t*(r/4).  (3.3.1) 

Because  f {•)  is  continuous  and  {f(xk)}  is  monotonical ly  decreasing 
(see  Lemma  3.3.2),  the  limit  of  {f(xk)},  f(x“),  exists.  For  k 
sufficiently  large 

f(x")  * f(xk)  - t*(r/8) . (3.3.2) 

For  k > ir  and  sufficiently  large  so  that  (3.3.2)  holds,  (3.3.1)  and 
(3.3.2)  imply  that 

f(xa>)  * f(xk)  - t*(r/8)  * f(xk)  - t*(r/4)  ^ f(xk+1)  > f(xk). 
k 

Since  { f (x  )}  decreases  monotonically,  this  is  a contradiction. 
Thus,  the  lemma  is  proved. § 

Theorem  3.3.1:  Given  that  f(*)  is  continuously  differentiable,  RSD 
either  terminates  at  a solution  or  generates  a sequence  { x k } for  which 
any  subsequential  limit  is  a solution. 

Proof:  If  RSD  terminates,  the  current  iterate  xk  must  satisfy  the 

stopping  criterion  in  Step  1.  By  Lemma  3.3.1,  xk  must  be  a solution. 

k 

When  a sequence  {x  } is  generated.  Lemma  3.3.3  guarantees  that  the 
limit  of  every  convergent  subsequence  is  a solution. §j 

In  addition,  when  (3.2.1)  has  a unique  solution,  e.g.,  when  f(*)  is 
strictly  pseudoconvex , the  entire  sequence  {xk}  converges  to  the  optimal 
solution  (see,  e.g.,  Bazaraa  and  Shetty  [1979]). 
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3.4  Finite  Convergence 

The  von  Hohenbalken  [1975,  1977]  proof  of  finiteness  for  the 
standard  SD  relies  on  the  fact  that  all  positively  weighted  extreme 
points  are  retained  from  one  iteration  to  the  next.  His  proof  also 
utilizes  Caratheodory ' s Theorem  which  implies  that  at  most  n+1  extreme 
points  define  each  iterate.  For  these  reasons,  the  proof  does  not  apply 
to  RSD. 

To  prove  finiteness  of  RSD,  the  following  properties  of  simplices 
from  Rockafellar  [1970]  are  necessary: 

Property  1:  If  x is  an  element  of  a p-simplex,  C,  then  x can  be  uniquely 

expressed  as  a convex  combination  of  the  points  zn,  z, , ...,  z 

0 1 P 

defining  C,  i.e. , 


x = Z a.z. 
1=0  1 


Z a.  = 1 
1=0 


at..  £ 0 i = 0,  1 , . . . , p 


and  Og,  ap  ...,  dp  are  unique. 

Property  2:  If  x is  an  element  of  a p-simplex,  C,  and  the  convex  weight, 
“i , for  some  i =0,  1,  ...,  p,  is  positive  in  the  (unique)  expression  of 
x as  a convex  combination  of  zQ,  z^  ...,  z , then  H(zQ,  z^  ...,  z^j, 
x,  zi+1 Zp)  is  also  a p-simplex. 

Property  3:  If  H(zQ,  Zl,  ...,  zp)  is  a p-simplex,  then  H(zQ,  Zj,  ..., 
zi_l»  zi+i>  •••»  zp)>  for  some  i =0,  1,  ...,  p,  is  a (p-1) -simplex. 
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We  need  the  following  additional  assumption: 

Assumption  3.4.1:  (3.2.1)  has  a unique  solution,  denoted  as  x*,  i.e., 
f(*)  is  strictly  pseudoconvex. 

Under  the  above  assumption  we  can  define  the  following  set: 

I = {ai  : 6f(x*)(ai  - x*)  = 0,  i = 1,  2 N} . 

We  refer  to  H(I  ) as  the  optimal  face  (see  Wolfe  [1970])  of  S. 

Note  that  Sf(x  )(a.  - x*)  > 0 for  all  a.  $ I*. 

The  following  two  lemmas  are  necessary  for  the  finiteness  of  RSD. 

Lemma  3.4.1:  Given  that  x"  € arg  min  {f (x)  : x 6 H (Zj , z v ...,  Zp)} 

where  each  z.  is  an  arbitrary  element  of  Rn,  and  x"  is  written  as 
P P 

^aizi»  where. Z a..  = 1,  and  0,  i = 1 p,  then 

6f(x")(z.  - x")  >0  implies  that  a.  = 0. 
u J 

Proof:  (By  contradiction)  Assume  that  ct.  > 0.  Define 

J 


and 


z = x"  + a.(x"  - Zj)/(1  - 0j) 

= x " / ( 1 - «j)  - -j^/d  - .j) 


P 

Z 


i=l  ifj 


v/t1  - “P 


. ,1.,  .“/U  - «))  = 0 - «,•)/(!  - «j)  = 1. 

1=1  ifj  J J J 

Therefore,  z is  an  element  of  H^,  z2,  ...,  z ) and  d = (z  - x") 


is  a feasible  direction.  However 
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6f  (x")d  = 6f(x")(z  - x")  = { a . 5 f ( x " ) ( x " - z - ) } / ( 1 - a-) 

J J J 

= { -ot j <5 f ( x " ) (Zj  - x *')}/(  1 - « ) 
< 0 


i.e.,  d = z - x"  is  a descent  direction,  a contradiction  to  the 

optimality  of  x".  Therefore,  a.  must  be  zero.! 

J “ 

Lemma  3.4.2:  Let  {x  } be  an  infinite  sequence  converging  to  x*,  the 
optimal  solution  to  the  problem  : min  {f(x)  : x € S} , where 

S - H(a^,  a^,  ...,  a^).  If  f(*)  is  continuously  differentiable,  then 
there  exists  an  integer,  ir,  such  that  for  any  k > ir  the  following  hold 

i)  5f(xk)(a.  - xk)  > 0 for  all  a.  / I*. 

J J 

ii)  min  {6f(xk)(y  - xk)  : y € S} 

= min  {5f(xk)(y  - xk)  : y € I*}. 

Proof:  For  each  a^  / I , the  continuity  of  <5f(»)  implies  that 

lim  6f(xk)(a  - xk)  = 6f(x*)(a,  - x*)  = r. 
k— >°°  J J J 

for  some  r.  > 0.  For  any  o such  that  0 < o < min  {r.  : a.  0 I*}  there 

J J 

must  exist  a positive  integer,  t,,  such  that  for  any  k > T • 

J J 

|5f(xk)(a.  - xk)  - 5f(x*)(a,  - x*)|  < (r . - o) 

^ I J J 

■(rj  " °)  < <sf(xk)(aj.  - xk)  - t.  < (r.  - o) 

0 < <5 f ( x k ) (3j  - Xk)  < 2 r . - 0. 


Then,  for  any  o between  0 and  r1  = min  {r.  : a-  / I*},  the 

J J 

following  holds  for  any  k > -rr  = max  {t.  : a.  & I* } 

J J 

5f(xk)(a.  - xk)  > o for  all  a.  / I*. 

J J 
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Since  o > 0,  this  proves  (i). 

Note  that  when  xk  is  not  optimal,  6f(xk)(yk  - xk)  must  be  negative. 

. * 

From  (i),  any  a^  £ I cannot  be  a solution  to  the  subproblem.  That  is 

min  {6f (xk ) (y  - xk)  : y € S} 

= min  {Sf(xk)(y  - xk)  : y 6 I*}. 8 


The  following  theorem  describes  the  behavior  of  the  algorithm  with 
respect  to  the  relation  between  Wk  and  I*,  and  holds  for  all  r M. 

Theorem  3.4.1:  If  RSD  generates  an  infinite  sequence  { xk } converging  to 

* 

x , then  there  exists  a positive  integer,  ir,  such  that  for  any  k > it  the 
following  hold: 

i)  The  incoming  extreme  point  at  iteration  k,  yk,  belongs  to  I*. 

k * 

n)  W$  is  a subset  of  I . 

Proof:  Let  tt  be  as  defined  in  Lemma  3.4.2,  i.e. , ir  = max  {t.  : a.  £ I*}. 

J J 


Then,  (i)  follows  directly  from  Lemma  3 . 4 . 2 ( i i ) . 


To  prove  (ii),  assume  that  for  some  k > tt,  Wk  is  not  a subset  of 

★ 

I . Without  lost  of  generality,  suppose  that  the  elements  a^,  a2,  ..., 

k * " 

aq  from  Wg  do  not  belong  to  I . Then,  Lemma  3 . 4 . 2 ( i ) implies  that  at 

the  end  of  Step  2 and  for  j = 1,  2,  ...,  q 


6f(xk+1)(a.  - xk+1)  > 0 

J 


which,  by  Lemma  3.4.1,  further  implies  that  the  weight  of  a.,  for  j = 1, 
2,  ...,  q,  in  the  expression  of  xk+*  as  a convex  combination  of  Wk+^ 
must  be  all  zero.  Therefore,  we  have  that  a^,  a2,  ...,  aq  satisfy  the 
column  dropping  criteria  in  Step  2,  which  means  that  the  set  Wk+1  will 
consist  entirely  of  elements  from  the  set  I . Moreover,  (i)  ensures 
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that  only  elements  from  the  set  I*  will  be  added  to  the  set  Wk  in 

s 

subsequent  iterations,  and  the  desired  result  is  obtained  by  letting  it 
equal  k+l.g 

The  result  of  the  lemma  below  implies  that  the  union  of  Wk  and 
{y  }>  the  solution  to  the  subproblem,  always  forms  a simplex.  This 
leads  to  the  result  of  Theorem  3.4.2  which  states  that,  for  k £ 0,  the 
set  W always  forms  a simplex.  Then,  it  is  shown  in  Theorem  3.4.3  that, 
when  r is  larger  than  the  dimension  of  H(I*),  the  algorithm  must  stop 
after  having  generated  a finite  number  of  simplices. 

Lemma  3.4.3:  Let  C = H(Zq,  z^,  ...,  zp)  be  a p-simplex  which  is  also  a 
subset  of  a convex  feasible  region  S.  Furthermore,  let  x'  6 arg  min 
{ f (x ) : x € C}  which  satisfies 

P 

x‘  = E 8 • z • 
i=0  1 1 

P 

E 8.  = 1 
i=0  1 

>0  i*0,  1 p. 

If  x1  does  not  minimize  f(x)  over  S,  then  there  exists  a y 6 S such 
that  6f(x')(y  - x')  < 0 and  H(zQ,  z^,  ...,  zp,  y)  is  a (p+1) -simplex. 
Proof:  (By  contradiction)  Assume  that  H(zQ,  Zj,  ...,  z y)  is  not  a 
(p+1) -simplex,  i.e.,  the  vectors:  zx  - zQ,  ...,  zp  - zQ,  y - zQ,  are 
linearly  dependent.  In  other  words,  there  exist  coefficients  o^. , for 
i = 1,  . . . , p,  satisfying 

y - Z0  = - zo) 
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P P 

y = (1  - E a.)zQ  + E a,Z. 

i=l  1 u i=l  1 1 

P P 

y = E a.z.  where  an  = 1 - E a.. 
i=0  1 1 u i=i  1 

Moreover,  it  follows  from  the  optimality  of  x1  over  C that 
'Sf(x')(z1  - x‘)  z 0 i = 0,  1,  p 
and,  from  Lemma  3.4.1,  this  implies  that 

6f(x,)(zi  - x')  = 0 i = 0,  1,  ...,  p 
since  0.  > 0 for  all  i.  However 

P 

0 > <5  f ( x ' ) (y  - x')  = <5f  (x  1 ) ( ( E a.z.)  - x') 

i=0  1 1 

P 

= 2 ot.[6f(x')(z.  - X1)]  = 0. 

i=0  1 1 

This  is  a contradiction  and  H(zQ,  Zj,  ...,  zp,  y)  must  be  a (p+1)- 
simpl ex. | 

Theorem  3.4.2:  In  RSD,  the  set  H(Wk)  is  a simplex  for  all  k * 0. 

Proof:  We  show  by  induction  that  H(Wk)  is  a simplex  at  the  start  of  Step 

1.  It  then  follows  that  H(Wk)  is  a simplex  at  every  step  of  RSD. 

When  k = 0,  W^  = {x^}  and  wj?  = 0.  Therefore,  W^  = {x^1}  and  H(W^) 

is  a 0-simpl ex . Assume  that  H(Wk)  is  a p-simplex  for  k ^ 0.  Then,  it 

k + 1 

must  be  shown  that  H(W  ) is  also  a simplex. 

Assume,  without  loss  of  generality,  that  Wk  = {an,  a, , . . . , a .} 

« U i p“i 

and  Wx  = {x 1 } , and  by  assumption  Wk  = Wk  U Wk  defines  a p-simplex.  The 
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elements  with  zero  convex  weight  have  been  discarded  at  the  end  of  the 
preceding  Step  2.  Therefore,  the  remaining  elements  in  Wk  at  the 

beginning  of  Step  1 must  have  positive  weights.  Since 

k k k i/i 

6f(x  )(y  - x ) * 0 implies  that  the  algorithm  must  stop  and  WK+1  is  not 

generated,  it  is  assumed  below  that  sf(xk)(yk  - xk)  < 0,  and  either  Step 

l(i)  or  1 ( i i ) will  be  executed  depending  on  the  cardinality  of  Wk. 

In  Step  l(i) , Wk+1  = Wk  U {yk},  and  H(Wk+1)  defines  a (p+1) -simplex 
because  of  Lemma  3.4.3. 

In  Step  1(11),  WM  = {a„,  ap  ....  a^,  yk,  aM,  ....  ap_j,  xk}, 

where  a.  has  the  minimum  weight.  In  this  case,  Wk+1  defines  a p-simplex 

because  H(aQ,  aj,  ...,  a j,  x',  yk)  is  a (p+1) -simplex  by  Lemma  3.4.3, 

k k 

H(a0»  ap  •••,  ap_i>  x , y ) is  a (p+1) -simplex  by  Property  2,  and  H(aQ, 

k k 

al»  •••»  ai-i»  ai+i»  •••»  ap-i>  x , y ) is  a p-simplex  by  Property  3. 


P- 
k+1. 


Thus,  in  either  case,  H(W  ) is  a simplex. 


At  the  end  of  Step  2,  all  zero  weighted  elements  of  Wk+^  are 
discarded  and  Property  3 guarantees  that  the  convex  hull  of  the 
remaining  elements  still  forms  a simplex.  Since  the  set  Wk+1  at  the  end 
of  Step  2 is  the  same  as  the  one  at  the  beginning  of  Step  1,  the  theorem 
is  proved. | 

^ ^ 

Theorem  3.4,3:  When  x is  unique  and  r £ dim  I + 1,  RSD  converges  to  x 

after  a finite  number  of  executions  of  Step  1. 

Proof:  (By  contradiction)  Assume  that  the  algorithm  generates  an 

k 

infinite  sequence  {x  } and,  because  of  the  uniqueness  of  the  solution, 

k *np  i 

{x  } must  converge  to  x . For  every  k,  H(WK)  is  a simplex  by  Theorem 

k 

3.4.2,  and  for  k > it,  W$  is  a subset  of  I , where  it  is  as  defined  in 
Theorem  3.4.1.  Since  H(Wk)  is  a simplex  and  Wk  = Wk  U Wk,  Wk  must  also 
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define  a simplex  by  Property  3.  Below,  it  is  shown  that,  at  the 
beginning  of  Step  1,  |Wk|  < r for  al  1 k > tt. 

Since  the  algorithm  does  not  stop  at  any  iteration  k > tt, 

5f(xk)(yk  - xk)  < 0 and  H(Wk  U {yk} ) is  a simplex  by  Lemma  3.4.3.  By 
Theorem  3.4.1(i),  yk  € I . Thus,  Wk  U {yk}  is  still  a subset  of  I , and 

|Wg|  “ 1 = dim  Wk  < dim  (Wk  U {yk})  s dim  I* 

|Wk|  < dim  I + 1 s r 

This  last  inequality  implies  that  Step  1 ( i i ) is  never  executed 
after  the  tt  iteration,  and  W is  a subset  of  I U W*,  for  all  k > tt. 

Moreover,  the  objective  function  values  at  the  iterates  xk  strictly 

decrease  after  each  iteration,  thereby  implying  that  the  set  Wk  cannot 
be  repeated  in  the  sequence  {Wk} This  is  impossible,  since  there 
are  only  a finite  number  of  distinct  subsets  of  I*  U wJJ  with  size  r+1  or 
less.  So,  the  algorithm  must  terminate  after  a finite  number  of 
executions  of  Step  l.| 

3.5  Master  Problem 

The  master  problem  of  (3.2.1)  may  be  formulated  as  follows 

min  f (Akw) 
s 

s.t.  z vi.  = 1 (3.5.1) 

w1  > 0 i = 1 s 
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where  A - [a^,  a2>  ...»  as]  is  the  matrix  whose  columns  are  the 

elements  of  W , w = [w^ , W2,  ws]^  are  the  associated  weights,  and 
s s r + 1. 

Since  the  master  problem  is  solved  to  optimality  and  Lemma  3.3.2 
shows  that  after  each  iteration  of  RSD  the  objective  function  value 
decreases,  the  weight  corresponding  to  the  last  generated  extreme  point 
must  be  positive.  The  theorem  below  uses  this  fact  to  eliminate  the 
equality  constraint  from  (3.5.1)  and  reduce  its  dimension  by  1. 

Theorem  3.5.1:  Given  that  w s > 0,  where  a$  is  the  last  generated  extreme 
point,  and  f(»)  is  pseudoconvex,  (3.5.1)  is  equivalent  to 


min  h(z)  = f(Bkz  + a ) 

s-t*  zi  * 0 i = 1.  ....  s-1  (3.5.2) 

where  Bk  = ^ - a$,  a£  - a$,  ...,  a$_1  - a$]. 

Proof:  The  Karush-Kuhn-Tucker  (KKT)  conditions  of  (3.5.1)  are 


k.  * 


6if(A  w ) + y + Ti  = 0 
* 

i 1 ) • • • ) s 

(3.5.3) 

Vi  = 0 

i = 1 c 

■ a ) • » • j 0 

(3.5.4) 

T..  £ 0 

★ 

and  w feasible. 

1 $ • • • j s 

(3.5.5) 

* * 

ws  > 0 and  wst$  = 0 imply  that  t$  = 0.  In  addition,  t = 0 and 

k * u * 

6$f(A  w ) + u + = 0 imply  that  y = -6$f(AKw  ). 

The  KKT  conditions  of  (3.5.2)  are 


5 -j h (z  ) + t!  = 0 


* 

:i"’i 


Z,.T*  = 0 


i = 1 , . . . , s-1 
i = 1 , . . . , s-1 


(3.5.6) 

(3.5.7) 
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i = 1 , . . . , s-1 


(3.5.8) 


and  z feasible. 


Since  a^z*)  = 5 1-f(Akw*)  - 5sf(Akw*)  for  i = 1, 


• • • J 


s-1,  (3.5.6) 


may  be  rewritten  as 


«if(Akw*)  - 5sf(Akw*)  + t]  = 0 


i = 1,  ....  s-1  (3.5.9) 

i = 1,  ....  s-1.  (3.5.10) 


• • • 9 


k * 

or  5 .f  (A  w ) + y + x*.  = 0 


Note  that  (3.5.7),  (3.5.8)  and  (3.5.10)  are  also  the  KKT  conditions 


Since  the  master  problem  must  be  solved  repeatedly,  it  is  important 
that  a super! inearly  convergent  method  be  used,  and  this  requires  the 
storage  of  an  adequate  approximation  of  the  Hessian  matrix.  However, 
this  matrix  is  at  most  rxr  in  RSD.  Further,  the  constraints  are  simple, 
making  the  projected  Newton  method  of  Bertsekas  or  its  quasi-Newton 
versions  a natural  choice. 

3.5.1  Bertsekas  Projection  Method  [Bertsekas  19821 

The  projection  method  of  Bertsekas  for  (3.5.2)  is  an  iterative 
procedure  of  the  following  form 


* 

for  (3.5.1).  Thus,  if  w is  an  optimal  solution  of  (3.5.1),  then 


* * T 

[Wp  •••,  ws_^]  will  be  an  optimal  solution  of  (3.5.2) .g 


(3.5.11) 


where  [*]+  denotes  projection  on  the  positive  orthant,  ak  is  a stepsize 
chosen  by  an  Armijo-like  rule  and  Dk  is  a positive  definite  symmetric 
matrix  which  is  partly  diagonal.  The  matrix  Dk  can  be  computed  on  the 
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basis  of  first  or  second  derivatives  of  h(«)  so  that  the  resulting 
method  becomes  a quasi-Newton  or  a Newton  algorithm. 

The  following  two  assumptions  are  necessary  for  the  convergence  of 
the  sequence  {zk}  to  a critical  point: 

Assumption  3.5.1:  The  gradient  6h(«)  is  Lipschitz  continuous  on  each 
bounded  set  of  Rn,  i.e.,  given  any  bounded  set  S in  Rn,  there  exists  a 
scalar  L (depending  on  S)  such  that 

I! 5 h ( z 1 ) - 6h(z2)|  S L||z1  - z2||  for  any  Zj,  z2  € S. 

Assumption  3.5.2:  There  exists  positive  scalars  tj,  and  nonnegative 
integers  q^,  q2  such  that 

Ti(wk)  1 1 x | 2 s xDkx  S x2(wk)q2||xl|2  for  all  x € Rn,  k = 0,  1,  ... 

where  w = ||zk  - [zk  - M6h(zk)]  [[  and  M is  a positive  definite  diagonal 
matrix. 

The  algorithm  described  below  utilizes  a scalar  € > 0 (typically 
small),  a fixed  diagonal  matrix  M (for  example  the  identity),  and  two 
parameters  8 £ (0,  1)  and  o € (0,  that  will  be  used  in  conection  with 
an  Armijo-like  stepsize  rule. 

Step  0:  Let  z°  be  a feasible  point.  Set  k=0. 

Step  1:  Select  a positive  definite  matrix  Dk  which  is  diagonal  with 
respect  to  the  set  Ik  given  by 
1+  = O'  : 0 * zk  S 6k,  6ih(zk)  > 0,  i = 1,  ....  s} 
where  €k  = min  {€,  wk)  and  wk  = ||zk  - [zk  - M6h(zk)]+||. 


Go  to  2. 
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Step  2:  Denote  pk  = Dk6h(zk) 

and  zk(a)  = [zk  - apk]  + for  all  a z 0. 

k+1 

Find  z given  by 

zW  = zk(ck) 

where  ak  = 8m  and  m is  the  first  nonnegative  integer  such 
that 

h(zk)  - h(zk(0m) ) * 

o {8m  Z k 6.h(zk)pk  + Z . 6.h(zk)[zk  - zk(f?m)]}. 
ijel*  i€lj 

k 1 k k+1 

If  h(z  ) = h(z  ),  z is  a solution  and  terminate. 
Otherwise,  set  k=k+l  and  go  to  1. 

The  stepsize  rule  may  be  viewed  as  a combination  of  the  Armijo-like 

rule  [Bertsekas  1976]  and  the  Armijo  rule  usually  employed  in 

unconstrained  minimization  (see,  e.g.,  Luenberger  [1984]). 

The  following  assumption  guarantees  that  if  the  algorithm  generates 
k 

a sequence  {z  } with  a limit  point  z , then  the  set  of  active 

★ 

constraints  at  z is  identified  in  a finite  number  of  iterations  and  the 
algorithm  is  equivalent  to  an  unconstrained  optimization  method 
restricted  on  T,  the  subspace  of  binding  constraints  at  z*. 

«Uf 

Assumption  3.5.3:  The  local  minimum  z of  min  { h ( z ) : z z 0}  is  such 
that  for  some  6 > 0,  h(*)  is  twice  continuously  differentiable  in  the 
open  sphere  {z  : ||z  - z ||  < 6},  and  there  exist  positive  scalars  m^  m2 
such  that 

"ilMI2  * xs2h(z)x  * m2|)x||2 
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for  all  z such  that  ||z  - z ||  < <$,  and  x f 0 such  that  x.  = 0 for  all 

* 

i € B(z  ),  where  B(z)  = {i  : = 0}.  Furthermore, 

<5 -jh (z*)  > 0 for  all  i 6 B(z*). 

By  Assumption  3.5.3,  <52h(z  ) is  positive  definite  on  T for  k 
sufficiently  large.  This  implies  that  if  the  portion  of  the  matrix  Dk 
corresponding  to  the  indices  i & Ik  is  chosen  to  be  the  inverse  of  the 
Hessian  or  an  approximation  of  the  inverse  of  the  Hessian  of  h(*)  with 
respect  to  these  indices  then  the  algorithm  eventually  reduces  to  a 
Newton  or  quasi-Newton  method  restricted  on  the  subspace  T.  This  type 
of  argument  can  be  used  to  construct  a number  of  Newton  or  quasi-Newton 
methods  and  prove  corresponding  convergence  and  rate  of  convergence 
results.  The  following  are  two  forms  of  the  matrix  Dk  that  have  been 
used  in  our  computational  experiments: 
i ) Newton  Method: 


D 


k 


(Hk)'1 


where  H is  the  matrix  with  elements  H^ . given  by 


r 


0 


2 k 

i jh  (z  1 


if  if  j and  either  i € Ik  or  j € Ik 
otherwise. 


i i ) Quasi-Newton  Method : 


D 


k 


-1 
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where  Q is  the  matrix  with  elements  Q* . given  by  the  BFGS  formula 

J 

[Dennis  and  More  1977].  We  first  recall  the  BFGS  formula: 


ykvkT 

Bk+1  = Bk  + 


77 


Bksk(Bksk)T 

skBksk 


where 

and 


sk  = zk+1  - zk 

yk  = 5h(zk+1)  - 6h(zk) 


Define  ok  = yksk.  It  is  known  [Powell  1978]  that  if  Bk  is  positive 

definite,  so  is  Bk  ^ if  and  only  if  ok  > 0.  To  preserve  the  positive 
k . k 

definiteness  of  B , if  o s 0,  we  generalize  the  BFGS  formula  as  in 
Powell : 


k kT 


Bk+1  = Bk  + 


T T 


k k 

T S 


Bksk (Bksk) T 
skBksk 


with 


$ = 


0.8  skBksk 
skBksk  - yksk 


k _ , 

T = < 


.k  k M .k,  Dk^k 

$y  + (1  - { ) B S 


if  o k S 0.2  skBksk 


otherwise. 


Now,  we  define  QK  by 


r 

0 


if  i f j and  either  i € Ik  or  j € Ik 
otherwi se. 
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3. 6 Imp! ementation 

A broad  class  of  real  world  problems  have  been  solved  to  test  the 
basic  RSD  algorithm.  The  test  problems  consist  of  the  linearly 
constrained  Colville  problems  [Colville  1968]  used  by  von  Hohenbalken 
[1977]  to  test  his  APL  code  for  SD,  several  traffic  assignment  models 
which  range  up  to  2836  arcs  and  which  have  become  standard  test 
problems,  the  667  node  Dallas  water  distribution  problem  [Collins  et  al . 
1978],  and  some  simple,  but  large,  electrical  networks  which  were 
obtained  from  RCA  physicists  studying  the  properties  of  conducting 
fibers  imbedded  in  an  insulating  polymer  [Balberg  et  al . 1983].  The 
test  problems  required  different  codes  for  the  linear  subproblems, 
according  to  the  type  of  model.  For  the  Colville  problems,  a local  LP 
code  was  developed.  This  code  allows  starting  each  subproblem,  after 
the  first,  from  the  optimal  solution  of  the  prior  iterate.  This  is  an 
obviously  important  feature  for  efficiency  of  the  decomposition.  For 
the  traffic  assignment  problems,  the  subproblems  become  shortest  path 
problems  [Nguyen  1976]  and  a code  of  Dijkstra's  method  by  Nguyen  and 
James  [1975]  was  employed.  Finally,  the  convex  single  commodity  minimum 
cost  flow  problems  have  as  subproblems  their  linear  version  for  which 
there  exist  many  algorithms.  For  the  experiments  below,  the  primal 
simplex  code  NETFLO  [Kennington  and  Helgason  1980],  modified  to 
accommodate  real  variables  and  to  allow  starting  from  the  prior 
solution,  was  chosen.  In  some  instances,  e.g.,  the  electrical  network 
models,  the  subproblem  may  also  be  solved  by  a shortest  path  code. 
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Both  the  water  distribution  and  electrical  network  problems  have 
unconstrained  variables  in  their  formulations  and  this  implies  that  the 
subproblem  can  be  unbounded.  However,  the  solution  for  each  problem  is 
known  to  lie  in  the  convex  hull  of  some  extreme  points.  To  illustrate 
these  cases,  consider  the  problem 

n 

min  z f.(x.) 
i=l  1 1 

s.t.  A x = b (3.6.1) 

where  x 6 Rn,  b € Rm  and  A is  an  mxn  node-arc  incidence  matrix. 

Consider  the  following  assumption: 

Assumption  3.6.1:  (•)  is  strictly  convex,  differentiable  and  symmetric 

about  zero,  i.e.,  fi (x-)  = fi(-x.)  for  i = 1,  2,  ...,  n. 

The  subproblem  of  (3.6.1)  in  RSD  is  defined  by 

n 

min  X 6 f . ( x . ) y . 
i=l  1 1 1 

s.t.  A y = b.  (3.6.2) 

Note  that  (3.6.2)  may  be  unbounded  due  to  the  existence  of  negative 
cycles  which  may  render  some  inefficiency  in  RSD.  Although  the  problem 
with  undirected  arcs  can  be  remedied  by  replacing  each  undirected  arc 
with  two  directed  arcs,  the  problem  of  negative  cycles  still  exists.  An 
alternative  solution  is  to  place  artificially  large  upper  bounds  on  all 
arcs  which  could  lead  to  the  following  undesirable  properties:  i)  slow 
progress  in  the  first  few  iterations,  and  ii)  artificially  low  lower 
bounds  on  the  optimal  objective  function  value. 
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An  alternative  to  solve  these  problems  is  to  replace  the  standard 
Subproblem  (3.6.2)  by  a new  one  that  always  generates  an  extreme  point 
and  produces  descent  directions.  The  following  theorem  gives  two 
problems  equivalent  to  (3.6.1),  the  subproblems  of  which  may  be  used 
instead  of  (3.6.2).  By  equivalent  we  mean  that  an  optimal  solution  in 
one  induces  an  optimal  solution  for  the  others. 

Theorem  3.6.1:  Under  Assumption  3.6.1,  the  Problem  (3.6.1)  is 
equivalent  to  the  following  two: 

n 

min  E f.(xt  + xT) 
i=l  1 1 1 

s.t.  A(x+  - x')  = b (3.6.3) 

xt,  xT  > o i = 1,  ...,  n 
and 

n 

min  z [fi(xt)  + f i (xT)  - f.(0)] 

s. t.  A(x  - x ) = b (3.6.4) 

xt,  xT  > 0 i = 1,  ...,  n. 

s 

£roof:  We  need  to  show  that  (3.6.1)  is  equivalent  to  (3.6.3)  and  to 
(3.6.4).  However,  in  the  interest  of  brevity,  we  will  prove  only  the 
first  equivalency,  since  the  second  can  be  done  in  a similar  manner. 

Let  (x+,  x")  be  an  optimal  solution  to  (3.6.3).  Then,  we  establish 
the  following  two  properties: 

i)  xt  x.  = 0 for  i * 1,  ....  n. 

ii)  x = x+  - x~  is  an  optimal  solution  to  (3.6.1). 
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To  prove  (i),  assume  the  converse,  i.e.,  there  exists  an  index  j 
such  that  xt  > o and  xT  > 0.  Define 

J J 

zt  = max  {0,  xj  - xT} 
zT  = max  {0,  xT  - xt} 

and  let 

Zj  = xt  and  = x^  for  all  i =f=  j. 

Then,  by  definition  of  (z+,  z") 


+ 


< 


z.  < x.  and 

J J 


= 0. 


Moreover,  due  to  the  strict  convexity  and  symmetry  of  f.(») 


Vzo  * 9 < VX1  * *? 

and  ^.(z*  ♦ zT)  = f.(x*  * xT) 


for  all  i =j=  j. 


That  is,  (z  , z ) is  a better  solution  which  satisfies  property 
(i),  and  we  have  a contradiction. 

To  prove  (ii),  assume  the  converse,  i.e.,  x is  not  an  optimal 
solution  to  (3.6.1).  Let  z be  an  optimal  solution  to  (3.6.1).  Then 

Define 


zi  = max  {0,  z.}  and  zT  = max  {0,  -ziJ  for  i = 1 n. 

Then,  (z+,  z ) is  feasible  to  (3.6.3)  and 
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* zi)  = = .^(xt  * xT). 

But  this  means  that  (x  + , x ) cannot  be  optimal  to  (3.6.3)  which  is 
a contradiction. 

Let  x be  an  optimal  solution  to  (3.6.1).  Define 


+ 

X.  = 


i = max  {°»  xj}  and  xi  = max  {0,  -xi)  for  i = 1 , ....  n. 

Then,  (x  , x ) is  feasible  to  (3.6.3).  Assume  that  (x+,  x")  is  not 
optimal  to  (3.6.3),  i.e.,  there  exists  (z  + , z")  feasible  to  (3.6.3)  such 
that 


■ Vi(zi)  s .r,fi(2i  * 2i)  < 2 f(x*  ♦ xT)  = i f(x.) 

i=l  i=l  i=l  1 1 i=i  1 

where  z = z - z and  z is  feasible  to  (3.6.1).  This  is  a contradiction 
since  x is  optimal  to  (3.6.1).| 

Now,  consider  the  subproblems  of  the  two  problems  equivalent  to 
(3.6.1)  at  a point  (x+,  x“)  or  x = x+  - x" 


min  z «fi(|x1|)y*+  E «f1(|x1|)y‘ 
i=l  i=l 

s.t.  A (yt  - y-)  = b 

y+>  y-  * o i = 1 n 


(3.6.5) 


min 


s.t. 


+ % + 


1.f1'5fi(x1)yi  * .2i«f1(xi)yj 

A (y^  - Yj)  = b 

+ — 

y i » y i £ 0 i = 1 , . . . , n . 


(3.6.6) 
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Note  that  (3.6.5)  and  (3.6.6)  always  have  a bounded  solution  since 
the  cost  coefficients  are  always  nonnegative  due  to  Assumption  3.6.1. 

In  theory,  we  can  solve  Problem  (3.6.1)  by  solving  either  Problem 
(3.6.3)  or  (3.6.4)  instead.  However,  this  is  not  quite  efficient  since 
the  number  of  variables  doubles  in  converting  Problem  (3.6.1)  to  (3.6.3) 
or  (3.6.4).  What  would  be  more  efficient  is  to  replace  Step  1 of  RSD  by 
the  following 


Step  1:  (Subproblem) 
Solve 


or 


min 


,Vf1(lXll)y1  * ,Vfi(lXil)yi 

1=1  1=1 


s.t.  A (yi  - yT)  = b 

yT,  y]  * o i = l,  ....  n 


min  z 5f.(x‘  )y  • + Z <5f.(x*  )y. 

i=l  1 i=i  1 1 1 

s.t.  A (yT  - yT)  = b 

yT,  yT  ^ 0 i = 1,  ...,  n 


k+ 


where  xlT  = max  {0,  xT}  and  x^"  = max  {0,  -x*}. 
If  .fj  * (y*?--  x1?’))  2 o 


k+, , k+  k+ 


[or  ^ xY)  ♦ s «f,(x^)(y-"-  x!f_)  2 0], 
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k 

x is  a solution  and  terminate. 

Otherwise,  set  yk  = yk+  - yk  and  continue  with  options  (i) 
and  (ii)  of  RSD. 


The  convergence  of  the  modified  RSD  algorithm  follows  the  standard 
convergence  proof,  given  that  yk  in  Step  1 does  in  fact  provide  a 
descent  direction.  The  theorem  below  demonstrates  that  both  Subproblems 
(3.6.5)  and  (3.6.6)  do  provide  a descent  direction. 

Theorem  3.6.2:  For  a given  x 6 S,  the  following  hold 


1)  I «f1(lx1IH(yl  - x*)  * (yT  - xt))  i i «fi(x.)(yi  - x,) 

I — i i = 1 


ii)  I «ff(xt)(y*  - xt)  . I Sf  (x^)(y-  - xt)  2 l 6f.(xi)(y. 
1-1  1=1  1 i=l  1 1 ' 

Proof:  Define  I+=  {i  : x.  * 0,  i = 1 n}  and 

H • xj  < i = •••,  n}.  Then,  (i)  follows  from 


xi) 


^Sf^lXjIHfy*  - xt)  * (yt  - x')>  - Z 5f1fx1Ky1  - x,) 


= 5fi(|xjl)(y1  - x,.)  ♦ ^ Sfidx^jyt 


* 4j  sfi(|Xil)y^  * ^ ( Ixi I ) (yT  - xT) 


iei 


- [.Ij  «f,(|xi|)(yt  - xt)  . I •Sf1C|x1|)(yt  - xt) 


E SfiUxil)yi  - I «ff{|x1|)y* 
i€I  i€I 


and  (ii)  follows  from 


+ , , + + 


1f1sfi(xi)fy1  " xd  + HyT  - XP  - r (*1)fy1  - x,; 


l,  <fi(|Xil)K  - XT  * .|j  sfidxil)(yi  - Xl) 


1€I 


5fi(lxil)(yj  - x()  * <Sy1(|x1|)(y'  - xT) 


i€I 


- 4, 5fidxil)yl  - 4r  sfidxil)yl 


= I Sf.-dx  |)yT  * Z 6f  (|x  |)y;  2 O.g 

itl+  i€I_ 

Assume  that  (y+,  y ) and  (z+,  z")  are  solutions  for  (3.6.5)  and 
(3.6.6),  respectively.  The  theorem  below  shows  that  (3.6.6)  gives  a 
steeper  direction  than  (3.6.5). 

Theorem  3.6.3:  For  a given  x € S,  the  following  hold 

if15fi(xi)(zi  - xi)  * .f15fi(xi)(yi  - V 

where  y = y+  - y"  and  z = z+  - z". 

Proof:  It  is  sufficient  to  show  that 

1f15fdxdzi  2 

Without  loss  of  generality,  assume  that  i 0 for  i = 1 n, 

If  not,  we  can  define  a new  node-arc  incidence  matrix  A changing  the 
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sign  of  the  columns  corresponding  to  the  negative  variables. 

Since  (y+,  y")  is  optimal  to  (3.6.5) 
n + _ n 

i516fi(xi)(yi  + yi ^ * f 6fi (xi) (z^  + zT).  (3.6.7) 

Since  (z  , z ) is  optimal  to  (3.6.6)  and  x.  £ 0 implies  that 
5fi(x-j)  = 0 for  i = 1,  n.  It  follows  that 


n 

l 

i=I 


4isfi(xPzl 5 

jz16fi(xi)(zi  - zl>  x .z1sfi(xi)(yl  - zI>- 

From  (3.6.7)  and  (3.6.8) 


(3.6.8) 


n + _ n n 

145fi<xiHzi  - zi>  s - z,0  * is^Xjllz*  * zT) 

n 

- .^Vx-iKy)  * y,0 

n . » 

' 1f1Sfi fXi Jyi  ' iZ15fi(xi)yi 

= ft  «fi(x,)(y;  - y,0-S 
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3.7  Computational  Results 
3.7.1  Colville  Problems 

The  three  linearly  constrained  Colville  problems,  numbers  1,  4 and 
7,  were  solved  on  a VAX  11/750  computer.  For  the  formulations,  see 
Colville  [1968]  or  the  text  by  Himmelblau  [1972].  The  RSD  results 
corresponding  to  the  projected  Newton  and  quasi-Newton  methods  are 
displayed  in  Tables  3-1  and  3-2,  respectively,  showing  the  results  of  50 
iterations  for  different  values  of  r.  The  first  of  these,  the  "Shell 
Problem,"  has  five  nonnegative  variables  and  ten  additional  linear 
constraints,  the  objective  function  is  the  sum  of  linear,  quadratic  and 
cubic  terms.  The  optimal  objective  value  is  -32.34868,  as  verified  by 
solving  the  "Shell  Dual",  Colville  problem  2.  When  r=2,  both  methods 
obtain  the  optimal  value  in  four  iterations,  whereas  the  Frank -Wolfe 
method  exhibits  slow  convergence  [Wolfe  1970]:  to  produce  the  objective 
accurate  to  seven  digits,  the  projected  Newton  version  required  over 
25000  iterations!  The  CPU  times  are  approximate  because  the  VAX  clock 
does  not  account  for  the  system  load. 

Results  for  "Wood's  Problem"  are  in  Tables  3-1 ( b)  and  3-2 (b) . It 
has  four  variables  in  the  objective  which  is  a nonconvex  polynomial  of 
degree  four.  The  only  constraints  are  simple  upper  and  lower  bounds  on 
all  variables.  The  unconstrained  minimum  occurs  within  this  region  and 
the  optimal  value  is  zero.  Furthermore,  there  are  both  easy  and  hard 
starting  points  for  the  problem.  The  start  shown  is  "hard"  in  the  sense 
that  some  methods  will  converge  to  a nonoptimal  point  (see  Colville 
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[1968]).  Since  the  projected  Newton  method  requires  a positive  definite 
Hessian  matrix  in  solving  the  master,  the  diagonal  elements  of  the  true 
Hessian  were  (heuristically)  increased  at  any  master  iteration  where 
this  was  not  the  case.  It  is  somewhat  remarkable  that  all  four  versions 
converged  to  the  optimal  solution.  However,  for  r=l  and  r= 2,  the  total 
number  of  iterations  to  reduce  the  objective  function  below  10~7  was 
16880  and  6920,  respectively.  For  r=3,  it  required  20  iterations,  and 
for  r-4,  just  8.  The  projected  quasi-Newton  method  performed  much 
better  than  the  projected  Newton.  For  r=2,  it  required  4 iterations, 
and  for  r=3,  only  3. 

The  final  Colville  problem  has  16  variables,  with  eight  linear 
constraints  plus  lower  and  upper  bounds  on  all  variables.  The  objective 
(to  be  maximized)  is  fourth  order.  Tables  3-l(c)  and  3-2(c)  show  that 
both  methods  are  comparable  and  demonstrate  that,  again,  the  performance 
of  RSD  improves  steadily  as  r increases. 

3.7.2  Traffic  Assignment  Problems 

Many  models  very  much  like  the  model  presented  here  have  been 
developed  over  the  last  35  years.  The  earliest  examples  of  this  kind 
are  due  to  Beckmann  [1951]  and  Wardrop  [1952],  and  later  by  Dafermos  and 
Sparrow  [1969],  Nguyen  [1974],  LeBlanc  et  al . [1975]  and  many  others. 

The  model  considered  is  the  following  multicommodity  network  flow 
problem 
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Table  3-1:  RSD  Results  for  Three  Colville  Problems  (Newton  method). 


Iter. 

Objective  Function  Value 
r=l (FW)  r=2 

(a)  Shell  Problem 

1 

-15.55724 

-15.55724 

2 

-21.66206 

-27.72511 

Init.  Solution: 

3 

-24.51198 

-32.28357 

(0,0, 0,0,1) 

4 

-27.49879 

-32.34868 

5 

-28.59594 

10 

-31.05066 

20 

-31.93449 

30 

-32.10358 

50 

-32.21180 

Total  CPU  sec. 

4.29 

0.82 

Total  # of  Proj. 

92 

14 

Iter,  to  7 digits 

accuracy 

>25000 

4 

Iter. 

r=l (FW) 

r=2 

r=3 

r=4 

(b)  Wood's  Problem  1 

8.839690 

8.839690 

8.839690 

8.839690 

2 

8.465989 

7.877027 

7.877027 

7.877027 

Init.  Solution: 

3 

8.254216 

7.876967 

7.876967 

7.876967 

(-3,-1, -3,-1) 

4 

8.119669 

7.876967 

7.876967 

7.876966 

5 

8.036823 

7.876967 

7.873999 

6.261434 

8 

7.924362 

7.876967 

6.959996 

0.000000 

10 

7.898680 

7.876967 

1.303893 

20 

7.877552 

7.876967 

0.000000 

30 

7.876988 

7.876967 

50 

7.876967 

7.876967 

Total  CPU  sec. 

3.53 

18.23 

6.30 

2.81 

Total  # of  Proj. 

130 

239 

149 

67 

Iter,  to  7 digits 

accuracy 

16880 

6920 

20 

8 

Iter. 

r=l (FW) 

r=2 

r=3 

r=4 

(c)  Gauthier's  Prob.  1 

-245.6341 

-245.6341 

-245.6341 

-245.6341 

2 

-245.2229 

-245.2223 

-245.2223 

-245.2223 

Init.  Solution: 

3 

-245.0936 

-245.0846 

-245.0147 

-245.0147 

(0,0,.. .,0) 

4 

-245.0524 

-245.0252 

-245.0043 

-244.9894 

5 

-245.0354 

-244.9899 

-244.9917 

6 

-245.0325 

-244.9896 

-244.9894 

10 

-245.0020 

-244.9895 

11 

-244.9984 

-244.9894 

20 

-244.9909 

30 

-244.9897 

50 

-244.9896 

Total  CPU  sec. 

19.21 

6.91 

5.68 

4.16 

Total  # of  Proj. 

53 

18 

13 

8 

Iter,  to  7 digits 

accuracy 

158 

11 

6 

4 
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Table  3-2:  RSD  Results  for  Three  Colville  Problems  (Quasi-Newton 
method ) . 


Iter. 

Objective  Function  Value 
r=l(FW)  r=2 

(a)  Shell  Problem 

1 

-15.55724 

-15.55724 

2 

-21.66206 

-27.72511 

Init.  Solution: 

3 

-24.51198 

-32.28357 

(0,0, 0,0,1) 

4 

-27.49879 

-32.34868 

5 

-28.59594 

10 

-31.05066 

20 

-31.93449 

30 

-32.10356 

50 

-32.21175 

Total  CPU  sec. 

4.48 

0.80 

Total  # of  Proj. 

165 

19 

Iter,  to  7 digits 

accuracy 

- 

4 

Iter. 

r=l(FW) 

r=2 

r=3 

(b)  Wood's  Problem  1 

158.188639  158.188639 

158.188639 

2 

144.103277 

0.961506 

0.961506 

Init.  Solution: 

3 

21.531098 

0.226265 

0.000000 

(-3,-1, -3,-1) 

4 

11.135727 

0.000000 

5 

9.862158 

10 

8.061648 

20 

7.880695 

30 

7.877078 

50 

7.876967 

Total  CPU  sec. 

4.53 

0.53 

0.49 

Total  # of  Proj. 

201 

20 

15 

Iter,  to  7 digits 

accuracy 

- 

4 

3 

Iter. 

r=l (FW) 

r=2 

r=3 

r=4 

(c)  Gauthier's  Prob.  1 

-245.6341 

-245.6341 

-245.6341 

-245.6341 

2 

-245.2229 

-245.2223 

-245.2223 

-245.2223 

Init.  Solution: 

3 

-245.0936 

-245.0846 

-245.0147 

-245.0147 

(0,0,. ..,0) 

4 

-245.0524 

-245.0252 

-245.0043 

-244.9896 

5 

-245.0354 

-244.9899 

-244.9917 

-244.9894 

6 

-245.0325 

-244.9896 

-244.9894 

8 

-245.0082 

-244.9894 

10 

-245.0020 

20 

-244.9909 

30 

-244.9897 

50 

-244.9895 

Total  CPU  sec. 

17.75 

5.39 

4.77 

4.71 

Total  # of  Proj. 

95 

32 

26 

23 

Iter,  to  7 digits 

accuracy 

“ 

8 

6 

5 
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min 


fa(t)  dt 


s.  t. 


xa  = E xa 
a k a 


A xk  = bk 


* 0 


for  all  a 

for  al  1 k € D 
for  all  k € D 


where  D is  the  set  of  origin/destinations  (commodities),  x is  the  flow 

t h k 

of  the  k commodity  on  link  a,  xg  is  the  aggregate  flow  on  link  a,  x 

+■  h 

is  the  vector  of  flows  of  the  kLn  commodity,  and  f (•)  is  a convex 

d 

nondecreasing  function  that  represents  the  travel  time  on  link  a.  The 
optimal  solution  of  this  problem  is  a set  of  flows  satisfying  Wardrop's 
"user-equil ibrium  principle".  According  to  this  principle,  the  cost 
(time)  of  every  utilized  path  between  an  origin  and  a destination  is  a 
constant  no  greater  than  the  cost  of  any  nonutil ized  path.  Of  course, 
the  cost  of  a path  is  the  sum  of  the  link  costs  for  links  in  the  path. 
Table  3-3  summarizes  the  results  on  four  traffic  assignment  problems 
from  the  literature,  two  of  which  are  based  on  street  networks  for  the 
Canadian  cities  of  Hull  and  Winnipeg.  In  the  Table  the  size  of  each 
network  is  displayed  as  nodes/links/centroids.  A centroid  is  a node 
which  is  either  an  origin  or  a destination  of  some  fixed  travel  demand. 
The  absence  of  upper  bounds  on  the  variables  causes  the  linear 
subproblem  to  decompose  into  a set  of  shortest  path  problems,  one  for 
each  commodity. 

The  quantity  LB  associated  with  each  problem  is  the  largest  lower 
bound  generated  by  RSD  using  the  tangent  plane  inequality  (equivalently. 
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the  duality  gap,  see  Hearn  [1982]).  All  problems  were  run  in  single 
precision  FORTRAN  on  an  IBM  3081D  which  has  an  accurate  CPU  clock.  The 
reason  for  termination  of  each  run  is  shown  in  the  final  column,  whether 
maximum  allowed  iterations  (IT),  maximum  CPU  seconds  (CPU),  relative 
error  within  the  present  tolerance  (RE),  or  three  consecutive  master 
iterations  could  not  decrease  f(«)  with  a step  of  at  least  10"^  in  the 
projected  direction  (ND). 

Table  3-4  compares  the  performance  of  RSD  with  the  projected  Newton 
method  (RSD^)  and  the  RSD  code  of  Lawphongpanich  and  Hearn  [1983]  (RSD,,) 
for  the  two  small  traffic  networks  of  Table  3-3.  It  is  clear  that  RSD^^ 
outperforms  RSD£  in  terms  of  total  number  of  projections,  CPU  time,  and 
accuracy  of  the  objective  function  value. 

In  the  traffic  assignment  problems,  the  majority  of  the  effort  (80% 
- 90%)  is  spent  on  the  shortest  path  calculations.  Using  the  number  of 
shortest  path  calculations  as  the  sole  criterion,  RSD  compares  very 
favorably  with  other  algorithms  (see  Table  3-5).  In  fact,  there  is  only 
one  instance  in  which  another  method,  RESTRICTION  on  the  Hull  network, 
outperforms  RSD.  However,  as  described  in  Guelat  [1983],  RESTRICTION 
can  be  considered  as  a variant  of  RSD  which  allows  for  restarting  at 

J.  L. 

every  kcn  iteration. 

Considering  the  size  of  the  Hull  and  Winnipeg  networks,  it  may  be 
surprising  that  RSD  performs  well  with  a relatively  small  value  of  r. 
This  is  partially  explained  by  the  fact  that  solutions  to  traffic 
assignment  problems  involve  a small  number  of  utilized  paths.  Since 
there  is  a one-to-one  correspondence  between  paths  and  extreme  points,  a 


Table  3-3:  RSD  Results  for  Traffic  Assignment  Problems  (Newton  Method). 
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Table  3-4:  Comparison  between  the  new  RSD  code  (RSD,)  and  the  RSD  code 
of  Lawphongpanich  and  Hearn  [1983]  ( RSD 2 J . 


Probl em 

r 

CPU 

Sec. 

RSD1 

No.  No. 
Iter.  Proj 

Final 
. 0b  j . 

CPU 

Sec. 

RSC 

No. 

Iter. 

'2 

No. 

Proj. 

Final 

Obj. 

2 

0.71 

29 

44 

1455.05 

1.29 

29 

1699 

1455.85 

2 

1.19 

50 

74 

1454.50 

3 

0.34 

11 

21 

1453.14 

0.71 

11 

1091 

1455.72 

NINE-NODE 

3 

1.22 

39 

66 

1453.13 

4 

0.27 

9 

20 

1453.31 

0.55 

9 

704 

1455.84 

4 

1.18 

38 

74 

1453.25 

5 

0.22 

6 

16 

1453.15 

0.26 

6 

171 

1455.44 

5 

0.47 

14 

28 

1453.14 

2 

0.99 

40 

66 

85085.3 

1.94 

40 

3804 

85098.5 

2 

1.21 

50 

85 

85072.5 

DUPUIS 

3 

0.59 

22 

38 

85047.6 

1.28 

22 

1963 

85099.7 

3 

0.87 

32 

60 

85038.8 

4 

0.20 

6 

12 

85046.8 

0.32 

6 

223 

85097.5 

4 

0.22 

7 

13 

85027.6 
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Table  3-5:  Number  of  Shortest  Path  Calculations  to  achieve  the  Relative 
Error. 


Algorithm 

% 

DUPUIS 
.10%  .05% 

Relative  Error 
HULL 

.10%  .05% 

WINNIPEG 
1.0%  0.5% 

r=l 

87 

NA 

43 

71 

29 

43 

r=2 

22 

55 

RSD  r=3 

11 

11 

15 

19 

r=4 

6 

6 

18 

26 

r=5 

12 

16 

r=9 

12 

16 

15 

20 

FW 

NA 

NA 

37 

70 

29 

45 

Guelat  [1983]  Partan 

11 

28 

15 

21 

18 

27 

Restriction 

9 

9 

10 

13 

17 

25 

Quadratic  Approx. 

22 

27 

25 

36 

Dembo  and 

FW 

65 

171 

46 

80 

Tulowitzki 

TQP-FW 

32 

56 

35 

52 

[1983]* 

Partan 

23 

41 

34 

47 

TQP-PT 

32 

60 

31 

48 

NA  = The  algorithm  terminated 

before  the  indicated 

Relative 

Error 

was 

obtained. 

Restriction  = A strategy  similar  to  RSD  with  restart  every  6 iterations. 
Quadratic  Approx.  = Guelat's  variation  of  Truncated  Quadratic  Prog. 
TQP-FW  = Truncated  Quadratic  Programming  using  FW  on  the  Quadratic 
Subprobl em. 

TQP-PT  = Truncated  Quadratic  Programming  using  Partan  on  the  Quadratic 
Subprobl em. 

* = The  Relative  Error  as  reported  in  Dembo  and  Tulowitzki  [1983]  is 
based  on  the  lower  bound  calculated  at  the  current  iterate. 
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small  number  of  utilized  paths  implies  a small  number  of  extreme  points 
in  the  optimal  face. 

3.7.3  Electrical  Network  Problems 

These  models  arise  in  the  study  of  conducting  fibers  (sticks) 
imbedded  in  an  insulating  polymer  [Balberg  et  al . 1983].  The 
conductivity  in  the  composite  is  induced  when  the  sticks  form  a 
continuous  network  from  one  boundary  to  its  opposite.  In  these 
networks,  nodes  (junctions)  represent  the  sticks  and  the  boundaries,  and 
arcs  (resistors)  represent  the  intersections  formed  by  overlapping 
sticks.  It  is  well  known  that  the  equilibrium  conditions  of  an 
electrical  network  are  attained  as  the  total  energy  loss  is  minimized 
(see,  e.g. , Cooper  and  Kennington  [1977]).  These  networks  of  resistors 
have  a single  origin  and  a single  destination.  If  the  total  flow  is 
assumed  to  be  one  unit  and  the  resistence  in  each  arc  is  also  one  unit, 
then  the  total  energy  loss  equals  the  effective  resistance  of  the 
network.  The  formulation  of  the  problem  is 

n 2 
min  z x . 

i=l  1 

S.t.  A X = b 

where  x^  denotes  the  current  on  arc  i. 

Table  3-6  summarizes  the  results  for  RSD  on  five  large-scale 
electrical  networks  taken  from  the  studies  of  Balberg  et  al . [1983]. 
These  problems  are  of  interest  in  this  study  because  they  are 
pathological  (to  RSD)  in  the  sense  that  many  paths  from  the  origin  to 
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the  destination  will  be  utilized,  and  this  implies  many  extreme  points 
defining  the  optimal  face.  In  fact,  it  is  easy  to  verify  that  the 
number  grows  exponentially  with  the  size  of  the  network.  The  lower 
bounds  were  obtained  by  solving  the  unconstrained  quadratic  dual  problem 
by  the  conjugate  gradient  method  (see  Subsection  4.2.2).  While  the 
relative  error  is  shown  to  decrease  with  increasing  r,  the  increase  in 
CPU  time  (IBM  3081D)  does  not  justify  the  decrease  and  the  basic  Frank- 
Wolfe  method  is  not  significantly  improved  upon.  For  very  large  values 
of  r=30  and  r=50,  this  performance  further  demonstrates  that  the 
computational  burden  of  the  larger  master  problem  is  too  great  relative 
to  the  effort  required  to  solve  the  subproblem. 

Subsection  4.2.2  shows  that  the  dual  of  this  problem  can  be  solved 
much  more  efficiently  by  the  conjugate  gradient  method  in  conjuction 
with  sparse  matrix  techniques. 

3.7.4  Water  Distribution  Problems 

These  models  deal  with  determining  the  steady  state  flows  and 
pressures  in  a water  distribution  network.  Collins  et  al . [1978]  give  a 
precise  discussion  and  formulation  of  these  models  as  optimization 
problems.  The  formulation  is 

Q P 

min  Z f - (x , ) + Z c,y, 
i=l  1 1 j=l  J J 

s.t.  x + y = b 


1 s x < u 


Table  3-6:  RSD  Results  for  Electrical  Network  Problems  (Newton  Method). 
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where  some  of  the  f.(*)  are  of  the  form 


xi  2 1 

-(b,(a,  - t/))!  dt 

0 1 1 


bi! 


xT<ai 


2,  f 

x.)2+  a.  sin  i 

V 


-1  xil 


and  the  rest  of  the  f..(*)  are  of  the  form 

f-jtx,)  = •c1|xi|2*85 


where  a.,b.  and  k.  are  constants. 

We  have  tested  on  the  IBM  3081D  the  Dallas  water  distribution  model 
(see  Table  3-7)  which  has  been  employed  as  a test  problem  for  a number 
of  studies  [Collins  et  al . 1978,  Dembo  and  Klincewicz  1981,  Beck  et  al . 
1983,  Dembo  1983,  Klincewicz  1983,  and  Kamesam  and  Meyer  1984].  This 
problem  has  the  potential  for  having  a large  number  of  extreme  points  in 
the  optimal  face.  Thus,  one  can  expect  a result  similar  to  that  of  the 
electrical  networks.  Nevertheless,  it  is  encouraging  that  the  objective 
function  value  for  r=4  is  lower  than  the  previous  best  value  -206175.2 
[Kamesam  and  Meyer  1984]  and  was  obtained  in  approximately  one  minute  of 
CPU  time.  It  should  be  noted  that  the  inaccuracies  in  the  approximating 
functions  of  the  Dallas  problem  induce  negative  cycles  in  the  solution 
and  preclude  the  calculation  of  a lower  bound  in  the  standard  manner 
[Kennington  1984]. 
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Table  3-7:  RSD  Results  for  the  Water  Distribution  Problem  (Newton  Method). 


Problem 

r 

CPU 

Sec. 

No. 

Iter. 

No. 

Proj. 

Final 
0b  j . 

Term 

DALLAS 

1 

56.14 

88 

131 

-205676.7 

ND 

(667/1796/425) 

2 

54.88 

90 

90 

-205888.4 

ND 

Col  1 ins  et  al . 

3 

53.08 

87 

87 

-205853.3 

ND 

[1978] 

4 

61.68 

103 

103 

-206179.8 

ND 

5 

46.62 

73 

73 

-205780.8 

ND 

CHAPTER  FOUR 


DUAL  METHODS  FOR  SEPARABLE  STRICTLY  CONVEX 
QUADRATIC  NETWORK  PROBLEMS 

4.1  Introduction 

The  electrical  network  problems  discussed  in  Subsection  3.6.3  are  a 
clear  example  of  a case  where  the  RSD  algorithm  does  not  perform  well  in 
comparison  to  other  methods  such  as  the  FW  algorithm  (special  case  of 
RSD  when  r=l).  This  is  due  to  the  fact  that  the  dimension  of  the 
optimal  face  is  close  to  n (the  number  of  variables)  and,  therefore,  in 
order  to  be  able  to  generate  a simplex  containing  the  optimal  solution, 
the  parameter  r has  to  be  very  large.  This  then  makes  the  repeated 
solving  of  the  master  problem  ineffective  in  terms  of  CPU  time  compared 
to  the  CPU  time  required  to  solve  the  linear  subproblem  (a  simple 
shortest  path  calculation).  However,  the  dual  of  this  problem  is  an 
unconstrained  convex  quadratic  program,  the  Hessian  is  sparse,  and  the 
structure  is  simple  due  to  the  network  constraints. 

This  chapter  is  concerned  with  the  application  of  conjugate 
gradient  based  methods  to  solve  dual  formulations  of  separable  strictly 
convex  quadratic  network  problems.  Two  different  cases  are  considered: 
the  first  one  specializes  the  classical  conjugate  gradient  method  of 
Hestenes  and  Stiefel  [1952]  and  a preconditioned  version  to  networks 
with  undirected  arcs;  and  the  second  applies  a dual  based  ascent  method 
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to  networks  where  the  flows  on  the  arcs  are  subject  to  upper  and  lower 
bounds.  The  ascent  directions  of  this  method  are  also  generated  by 
conjugate  gradient  formulas. 

It  is  important  to  point  out  that,  although  this  chapter  is 
concerned  with  pure  network  problems,  the  extension  of  these  methods  to 
generalized  networks  is  straightforward  and  can  be  done  in  a similar 
manner. 

4.2  CASE  I:  Networks  with  Undirected  Arcs 
Consider  the  following  problem 


matrix,  and  A is  an  mxn  node-arc  incidence  matrix  of  a graph  G(V,  E). 
The  Wolfe  dual  [Wolfe  1961]  of  (4.2.1)  is 


s.t.  A x = b 


(4.2.1) 


where  c € Rn,  x € Rn,  b € Rm,  D is  a diagonal  positive  definite  nxn 


max  \ xTDx  + cTx  + mT(Ax  - b) 

s.t.  Dx  + c + ATu  = 0 


(4.2.2) 


where  u € Rm  is  the  vector  of  dual  variables. 

We  can  eliminate  the  x variables  by  using  the  relation 


x = -D_1(ATu  + c) 


and  the  dual  problem  becomes 


min  g(u)  = J yTAD-1ATji  + (D_1ATc  + b)Ty  + \ cTD-1c  (4.2.3) 


T.n-1.T 
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where  the  Hessian  matrix  is  H = AD_1AT  and  the  linear  coefficients  are 
5 = D 1ATc  + b.  H and  5 can  be  computed  directly  by  using  the  network 
data  structure  as  follows 
n 


Hii  = .vi/Dij)Aij = E 1/D- 

J=1  J J 

where  E.  = {j  : (i.j)  6 E} 


J=1  'd  'J  j€E. 


i • j m 


Hij  = |<f1(1/Dik)AikAjk  ‘ ‘ 


-1/Dij 


0 


BU  = '1/Dij><ci  -‘j>  *bij 


if  (i.j)  € E 


otherwise 


1 + J 


(i.j)  € E. 


Note  that  the  matrix  H is  positive  semidefinite  since  y^Hy  = 


(D  2A  y)  (D  2A  y)  = ||D  2 A 1 y || ^ 1 0 for  all  y E Rm.  This  implies  that  the 
dual  problem  (4.2.3)  is  a convex  quadratic  program.  Therefore,  y*  is  a 
solution  of  (4.2.3)  if  and  only  if  5g(y*)  = 0.  Thus 


» T 2 


5g(y  ) = H y +5=0. 


(4.2.4) 


The  solution  of  this  system  of  linear  equations  is  discussed  in  the 
next  subsection. 


4.2.1  Conjugate  Gradient  Method 

There  is  a substantial  class  of  large-scale  systems  of  linear 
equations  for  which  the  coefficient  matrix,  H,  is  sparse  and  its 
elements  may  be  generated  by  some  simple  formula.  For  instance,  in 
(4.2.4)  the  elements  of  H are  easily  determined  by  the  structure  of  the 
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network.  Therefore,  it  is  desirable  to  solve  such  linear  systems  by 
iterative  methods  that  never  alter  H.  Such  a method  is  the  conjugate 
gradient  (CG)  algorithm  originally  introduced  by  Hestenes  and  Stiefel 
[1952]  to  solve  square  systems  of  linear  equations,  and  applied  to 
nonlinear  problems  by  Fletcher  and  Reeves  [1964],  Polak  and  Ribiere 
[1969],  and  others.  Below,  we  present  this  algorithm  and  show  how  the 
matrix  operations  can  be  carried  out  for  the  linear  system  (4.2.4). 


CG  Algorithm 


0 


Step  0:  Set  yu  = 0,  r°  = b,  du  = -6  and  k = 0. 


Step  1:  Compute 


k 

a = 


-rkTdk 

dkTHdk 


k*l  _ k k^k 

y = y + a d 

„k+l  I.  k +1  r 
r = Hy  + b 


and  go  to  2. 
Step  2:  Compute 


Bk  = 


k+lT  k+1 
r r 

rk  krk 


dktl  = -r^l  + 6kdk 


and  go  to  3. 

Step  3:  If  ||r*<  + 1||  s €,  terminate. 

Otherwise,  set  k = k + 1 and  go  to  1. 
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In  theory  the  directions  dk  are  H-conjugate,  i.e.,  dkHdk'  = 0 for 
all  k + k',  and  yk  minimizes  the  function  in  (4.2.3)  on  the  manifold 
defined  by  y°,  d°,  ....  dk_1. 

Note  that  the  CG  algorithm  is  efficient  in  the  sense  that  only  two 
expressions  with  matrix  operations  are  involved:  dk^Hdk  and  Hyk.  We 
show  below  that  these  expressions  can  be  carried  out  very  easily  by 
using  only  node-arc  network  information, 
i)  dkTHdk: 

dkTHdk  = dkTAD_1ATdk 


k.  2 


(1J)€E 


(1/DijHd-  - 9 


ii)  Hy 


(H/)i  -^Vj 


= H..y. 


- i 


j«Ei 


n k 
U . .y  . 

■»J  J 


1=1,. 


where  Ei  = {j  : (i,j)  € E}. 

Substituting  (i)  and  (ii)  in  Step  1 of  the  CG  algorithm,  one 
obtains  a version  of  the  algorithm  to  solve  (4.2.4)  which  does  not 
involve  an  explicit  representation  of  H.  In  terms  of  storage 
r egu i remen t s , the  mxm  matrix  H has  been  replaced  by  a vector  of  size  m 
to  store  the  diagonal  elements  of  H and  a 3xn  matrix  to  store  the 
information  of  the  arc  set  E.  This  exploitation  of  the  sparsity  of  H is 
similar  to  the  approach  suggested  by  Dembo  [1979]  and  by  Klincewicz 
[1983]  in  the  adaptation  of  Newton  methods  to  network  problems. 
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In  general,  if  the  matrix  H has  q s m distinct  eigenvalues,  the  CG 

algorithm  will  converge  in  q steps  with  exact  arithmetic.  With  inexact 

arithmetic,  however,  the  accuracy  of  the  solution  and  the  rate  of 

convergence  is  determined  from  the  eigenvalue  structure  of  H (see,  e.g.. 

Axel sson  [1977]  or  Jennings  [1977]).  Jennings  shows  that,  if  H is 

positive  definite  and  there  is  no  rounding  error,  the  number  of 

iterations  for  the  CG  method  to  achieve  a certain  accuracy  depends  on 

the  condition  number  of  H,  i.e.,  cond  H = e^e^  where  en  and  ex  are  the 

largest  and  smallest  eigenvalues  of  H.  When  most  of  the  eigenvalues  are 

in  a relatively  small  interval  compared  to  the  whole  spectrum,  e - e, 

n 1 ’ 

the  method  will  reach  the  same  accuracy  more  rapidly.  Jennings  shows 
that  if  the  stopping  criteria  of  the  CG  method  is  ||rk||/||b||  s €,  the 
number  of  iterations  to  achieve  a tolerance  € is 

cosh-1(l/6) 

k s -j  — _ 

cosh  [(en  + ei)/(en  - .ej)] 

and  if  6 <<  1 and  en  >>  e^,  the  above  inequality  can  be  approximated 
by 

k * i (en/ei)'  ln(2/€). 

These  arguments  show  that  it  will  be  helpful  if  the  system  of 
equations  (4.2.4)  could  be  transformed  into  an  equivalent  system  with 
better  conditioning.  First  of  all,  the  matrix  H is  required  to  be 
positive  definite.  This  can  be  easily  done  by  eliminating  the  first  row 
and  column  of  H and  setting  the  associated  variable  to  zero,  i.e., 

Vj  = 0.  Secondly,  to  improve  the  conditioning  there  are  two  methods: 
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scaling  and  preconditioning.  An  example  of  scaling  is  to  divide  each 
entry  H^. . of  H by  the  norm  of  the  ith  row  and  the  norm  of  the 
column,  and  scaling  the  vectors  y and  b accordingly.  Although  this 
method  is  heuristic,  if  H has  entries  of  different  magnitude,  the 
scaling  can  often  improve  the  condition  number  of  H [Nickel  and  Tolle 
1984].  In  applying  preconditioning,  it  is  necessary  to  find  a matrix 
C,  known  as  the  preconditioning  matrix,  such  that  the  condition  number 
of  C *H  is  better  than  that  of  H.  The  ideal  is  to  choose  C = H,  but  the 
computation  of  H"1  is  more  difficult  than  solving  (4.2.4). 

The  preconditioned  conjugate  gradient  (PCG)  method  [Axelsson  1977] 
presented  below  does  not  form  any  inverse  in  computation  but,  in  theory, 
it  solves  a system  of  linear  equations  equivalent  to  (4.2.4)  given  a 
positive  definite  preconditioning  matrix  C. 


PCG  Algorithm 


Step  0:  Set  = 
Step  1:  Solve 

and  set 
Step  2:  Compute 


0,  ru  = 0,  and  k = 0. 


C = r° 
d° . 


rkTrk 


dkTHdk 

= pk  ♦ « 


kdk 
+ b 
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Step  3:  Solve 


CrM  = rktl 


and  compute 


8k  = 


rk+lT;k+l 

rkT~rk 


dk+1  = -;k+l  + 0kdk 

Step  4:  If  (rk+1,  rk+1)*  s €,  terminate. 

Otherwise,  set  k = k + 1 and  go  to  2. 


Hestenes  [1980]  shows  that  the  PCG  algorithm  is  equivalent  to 

applying  the  basic  CG  to  a system  of  linear  equations  whose  coefficient 

- 1 - 1 1 
matrix  is  C 2HC  2,  which  is  similar  to  C H. 

The  PCG  algorithm  has  an  additional  step  with  respect  to  the  basic 

algorithm,  which  requires  the  solution  of  the  system  Cr  = r.  Thus,  this 

method  will  be  practical  only  if  C is  close  to  H in  the  sense  that  the 

-1 

condition  number  of  C H is  smaller  than  that  of  H,  and  Cr  = r is  easy 
to  solve,  i.e.,  requires  a small  number  of  operations  and  small  memory 
requirements.  Many  authors  (i.e.,  Axelsson  [1977],  Kershaw  [1978], 
Munksgard  [1980]  and  others)  have  studied  several  preconditioning 
matrices  for  different  type  of  problems.  Their  results  show  that  when 
the  problem  has  special  structure  the  PCG  method  can  be  very  efficient. 
The  matrix  used  in  our  computational  experiments  of  Subsection  4.2.2  is 
a scaling  matrix  defined  as 


r 


i.  j = i, 


m 


0 


otherwi se. 


•••  9 
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4.2.2  Computational  Experiments 

A Fortran  program  for  the  CG  algorithm  described  in  Subsection 
4.2.1  was  written  and  executed  on  an  IBM  3081D  to  test  the  five  sticks 
problems  described  in  Subsection  3.6.3.  The  program  required  only  110 
lines  of  code.  By  dividing  the  number  of  arcs  by  the  number  of  nodes, 
the  average  ratio  of  the  five  problems  is  2.48,  indicating  that,  while 
these  networks  contain  a large  number  of  arcs,  they  are  rather  sparse. 

Table  4-1  reports  the  number  of  iterations  (Iter.)  required  to 
achieve  a residual  error  (||r||)  of  at  least  lO”1,  10~2,  1 0-3 , 10~4  and 
10  ^ for  the  five  problems.  It  also  displays  the  percent  relative 
error  (%  RE)  calculated  with  respect  to  the  solution  which  achieves  a 
residual  error  less  than  10~10,  the  objective  function  value  (Obj.  Fun.) 
and  the  CPU  time  (CPU  Sec.).  It  is  important  to  note  that  the  CPU  time 
per  iteration  varies  from  0.0037  sec.  for  S-l  to  0.023  sec.  for 
S-5.  This  shows  that  the  sparse  matrix  techniques,  when  incorporated 
into  the  CG  algorithm,  produce  a very  efficient  method  for  this  type  of 
problem. 

The  above  program  was  modified  to  implement  the  PCG  algorithm. 

Table  4-2  shows  the  results  obtained  for  the  five  problems  in  a 

similar  manner  as  Table  4-1,  except  that  ||r||  has  been  replaced  by 
- 1 

(r,r)2.  In  this  case  the  CPU  time  per  iteration  varies  from  0.0044  sec. 
for  S-l  to  0.026  sec.  for  S-5.  Although  the  CPU  time  per  iteration 
increases  with  respect  to  the  standard  method,  the  total  number  of 
iterations  has  been  reduced. 
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Table  4-3  reports  a comparison  between  the  FW  algorithm  and  the  two 
conjugate  gradient  methods  (CG  and  PCG).  Based  on  the  CPU  times,  it  is 

clear  that  the  CG  method  is  far  superior  to  the  FW  technique.  In  the 

worst  case,  S-3  with  RE  < 10%,  CG  obtains  a 28.4%  reduction  in  CPU  time 
compared  to  FW.  On  average,  the  reduction  is  approximately  72.2%.  PCG 
requires  less  CPU  time  than  CG  in  twenty  cases,  the  same  in  one  case, 
and  more  in  four  cases.  However,  the  average  reduction  in  CPU  time 

using  the  PCG  method  is  only  8.1%  compared  to  the  standard  version. 

4.3  Case  II:  Networks  with  Upper  and  Lower  Sounds  on  the  Arcs 

Consider  the  following  problem 

min  f(x)  = | x^Dx  + c^x 

s.t.  A x = b (4.3.1) 

1 S x S u 

where  c € Rn,  1 € (R  U {±»})n,  u € (R  U {±»})n,  x € Rn,  b € Rm,  D is  a 

diagonal  positive  definite  nxn  matrix,  and  A is  an  mxn  node-arc 

incidence  matrix  of  a graph  G ( V , E). 

A first  approach  for  (4.3.1)  is  to  solve  its  Wolfe  dual  in  a 
similar  manner  to  the  unbounded  case.  The  Wolfe  dual  of  (4.3.1)  after 
the  elimination  of  the  primal  variables  becomes 
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Table  4-1:  Computational  Experience  with  the  Conjugate  Gradient 
Algorithm. 


Problem 

Iter. 

INI 

% RE 

Obj.  Fun. 

CPU  Sec. 

55 

l.E-1 

0.15 

6.92401934 

0.20 

76 

l.E-2 

0.20E-2 

6.93426068 

0.28 

S-l 

98 

l.E-3 

0.30E-4 

6.93438971 

0.37 

(237/454) 

110 

l.E-4 

0.00 

6.93439180 

0.41 

148 

l.E-10 

0.00 

6.93439180 

0.55 

59 

l.E-1 

0.70 

3.10269593 

0.63 

114 

l.E-2 

0.61E-2 

3.12437243 

1.24 

S-2 

148 

l.E-3 

0.58E-3 

3.12454425 

1.61 

(722/1412) 

183 

l.E-4 

0.31E-4 

3.12456167 

2.00 

297 

l.E-10 

0.00 

3.12456263 

3.26 

144 

l.E-1 

0.30 

11.14635139 

1.91 

196 

l.E-2 

0.49E-2 

11.17922315 

2.60 

S-3 

230 

l.E-3 

0.54E-4 

11.17976977 

3.06 

(952/1686) 

287 

l.E-4 

0.89E-9 

11.17977526 

3.82 

447 

l.E-10 

0.00 

11.17977527 

5.97 

S-4 

(852/2264) 

53 

95 

121 

153 

278 

l.E-1 

l.E-2 

l.E-3 

l.E-4 

l.E-10 

1.54 

0.96E-2 

0.96E-4 

0.13E-4 

0.00 

1.54266758 

1.56604357 

1.56619310 

1.56619443 

1.56619463 

0.83 

1.50 

1.92 

2.44 

4.44 

32 

l.E-1 

1.86 

0.45877002 

0.72 

51 

l.E-2 

0.18E-1 

0.46740087 

1.16 

S-5 

75 

l.E-3 

0.34E-3 

0.46748259 

1.72 

(902/3699) 

102 

l.E-4 

0.21E-7 

0.46748414 

2.35 

188 

l.E-10 

0.00 

0.46748415 

4.36 
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Table  4-2: 

Computational  Experience 
Gradient  Algorithm. 

with  the 

Preconditioned  Conjugate 

Problem 

Iter. 

(r,r)J 

% RE 

Obj.  Fun. 

CPU  Sec. 

42 

l.E-1 

1.53 

6.82844300 

0.18 

67 

l.E-2 

0.37E-2 

6.93413347 

0.30 

S-l 

75 

l.E-3 

0.42E-4 

6.93438886 

0.33 

(237/454) 

81 

l.E-4 

0.17E-4 

6.93439063 

0.36 

105 

l.E-10 

0.00 

6.93439180 

0.46 

29 

l.E-1 

8.92 

2.84571546 

0.36 

74 

l.E-2 

0.10 

3.12133159 

0.94 

S-2 

106 

l.E-3 

0.28E-3 

3.12455375 

1.35 

(722/1412) 

128 

l.E-4 

0.64E-5 

3.12456248 

1.64 

222 

l.E-10 

0.00 

3.12456263 

2.86 

S-3 

(952/1686) 

80 

139 

162 

198 

312 

l.E-1 

l.E-2 

l.E-3 

l.E-4 

l.E-10 

2.73 

0.11E-1 

0.28E-3 

0.00 

0.00 

10.87415707 

11.17851078 

11.17974343 

11.17977527 

11.17977527 

1.26 

2.20 

2.57 

3.15 

4.98 

26 

l.E-1 

6.63 

1.46238458 

0.45 

56 

l.E-2 

0.11 

1.56454223 

1.00 

S-4 

81 

l.E-3 

0.56E-3 

1.56618582 

1.46 

(852/2264) 

107 

l.E-4 

0.45E-5 

1.56619456 

1.94 

198 

l.E-10 

0.00 

1.56619463 

3.61 

4 

l.E-1 

70.90 

0.13602842 

0.09 

33 

l.E-2 

0.17 

0.46667111 

0.83 

S-5 

54 

l.E-3 

0.15E-2 

0.46747715 

1.38 

(902/3699) 

71 

l.E-4 

0.86E-5 

0.46748411 

1.82 

136 

l.E-10 

0.00 

0.46748415 

3.51 
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Table  4-3:  Comparison  of  Frank-Wolfe  (FW),  Conjugate  Gradient  (CG),  and 
Preconditioned  Conjugate  Gradient  (PCG)  Techniques. 

CPU  Time  in  Seconds  (Number  of  Iterations) 


Problem  Method  RE  < 10%  RE  < 5%  RE  < 3%  RE  < 1%  RE  < 0.5% 


FW 

0.36 

( 

6) 

0.49 

( 

9) 

0.58 

( 

11) 

0.88 

( 

18) 

1.31 

( 

28) 

CG 

0.15 

( 

41) 

0.16 

( 

43) 

0.16 

( 

44) 

0.18 

( 

49) 

0.19 

( 

51) 

PCG 

0.14 

( 

33) 

0.16 

( 

36) 

0.17 

( 

39) 

0.20 

( 

46) 

0.22 

( 

50) 

FW 

2.62 

( 

23) 

3.48 

( 

33) 

4.34 

( 

43) 

7.71 

( 

82) 

* 

CG 

0.37 

( 

35) 

0.46 

( 

44) 

0.53 

( 

50) 

0.61 

( 

57) 

0.67  ( 

63) 

PCG 

0.34 

( 

28) 

0.45 

( 

36) 

0.50 

( 

40) 

0.60 

( 

48) 

0.68  ( 

54) 

S-3 


FW 

CG 

PCG 


1.55  ( 13) 
1.11  ( 84) 
1.04  ( 66) 


2.14  ( 19) 
1.29  ( 98) 
1.12  ( 71) 


2.75  ( 25) 
1.48  (112) 
1.25  ( 79) 


4.96  ( 47) 
1.72  (130) 
1.57  ( 99) 


7.57  ( 73) 
1.85  (140) 
1.74  (110) 


S-4 


S-5 


FW 

6.11 

( 

47) 

8.47 

( 

66) 

* 

★ 

* 

CG 

0.53 

( 

34) 

0.62 

( 

40) 

0.72 

( 46) 

0.89  ( 57) 

0.98  ( 62) 

PCG 

0.42 

( 

24) 

0.51 

( 

29) 

0.54 

( 31) 

0.71  ( 40) 

0.80  ( 45) 

FW 

* 

★ 

* 

* 

* 

CG 

0.49 

( 22) 

0.56 

( 25) 

0.65 

( 29) 

0.70 

( 31) 

0.84  ( 

37) 

PCG 

0.41 

( 17) 

0.49 

( 20) 

0.55 

( 22) 

0.67 

( 27) 

0.73  ( 

29) 

* = When  FW  terminates  at  the  100  th  iteration,  the  indicated  relative  error 
was  not  achieved. 
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min  g (y , v,  w)  = | [yT,  vT,  wT] 

ad_1at 

AD-1 

-AD-1 

y 

D'1 

-D_1 

V 

1 

1 

CD 

1 

1 — » 
n=» 

— i 

-D"1 

D"1 

w 

+ [cTD-1AT+bT,  cTD-1+uT,  -cTD-1-1T] 


+ \ cTD_1c 


s.t.  v * 0 
w £ 0 


(4.3.2) 


where  y € Rm,  v € Rn  and  w € Rn  are  the  vectors  of  dual  variables 
corresponding  to  the  equality  constraints,  upper  and  lower  bounds, 
respectively. 

In  recent  years,  several  authors  such  as  O'Leary  [1980],  Dembo  and 
Tulowitzki  [1983a]  and  Yang  and  Tolle  [1985]  have  developed  different 
versions  of  the  CG  method  for  convex  quadratic  problems  subject  to  box 
constraints  (upper  and  lower  bounds  only).  They  essentially  use  this 
iterative  procedure  to  carry  out  the  fundamental  step  of  equation 
solving  in  an  active-set  method.  Also  the  projected  Newton  method  of 
Bertsekas  (see  Subsection  3.5.1)  can  be  extended  to  box  constrained 
problems.  All  of  these  iterative  schemes  have  the  property  of  finite 
convergence  under  exact  arithmetic.  The  Dual  Problem  (4.3.2)  is  a 
special  case  of  a box  constrained  problem  with  some  lower  bounds  only 
and,  therefore,  these  methods  can  be  specialized  to  this  problem  taking 
advantage  of  the  sparsity  of  the  Hessian.  However,  since  the  size  of 
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the  dual  problem  is  m+2n  variables,  this  approach  may  not  be  very 
promising  for  large  networks. 

For  this  reason,  we  take  the  approach  of  solving  the  following 
Lagrangian  dual  formulation  of  (4.3.1) 


s.t.  1 s x s u. 

The  structure  of  the  primal  problem  allows  us  to  express  the 
function  g(«)  in  a simpler  form 


Problem  (4.3.3)  is  unconstrained.  The  two  theorems  below  show  that 
the  objective  function  g(*)  is  concave  and  differentiable. 

Theorem  4.3.1:  The  function  g(y)  is  concave. 

Proof:  The  result  follows  because  g(«)  is  the  pointwise  minimum  of 
functions  linear  in  y.  See  Theorem  4.1.13  in  Mangasarian  [1969]. | 

The  following  assumption  and  two  lemmas  are  necessary  for  the  proof 
of  the  second  theorem. 

Assumption  4.3.1:  The  set  X,  defined  as  X = {x  : 1 S x <,  u},  is  bounded. 
Lemma  4.3.1:  For  any  y € Rm  the  set  G(y),  defined  as  G(y)  = 
arg  min  xTDx  + cTx  + yT(Ax  - b):  x € X},  is  a singleton. 


(4.3.3) 


g(u)  = min 


s.t.  1 $ x £ u 


where  D^,  for  any  (i,j)  € E,  is  a diagonal  element  of  D. 


Proof:  The  result  follows  from  the  positive  definiteness  of  D.| 
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Lemma  4.3.2:  Let  {y  } be  an  infinite  sequence  of  dual  variables 
converging  to  y,  and  let  xk  € G(yk)  for  all  k.  Then  {xk}  converges  to 
where  x € G(y). 

Proof:  (By  contradiction)  Assume  that  there  exists  an  6 > 0 such  that 
k 

II x “ x ||  > € for  all  k.  Since  X is  compact,  there  exists  a convergent 
subsequence,  {xk)K,  with  limit  x"  in  X.  Note  that  ||x"  - xfl  > €. 
Furthermore,  for  each  k 6 K we  have 


? xkTDxk  + cTxk  + ykT(Axk  - b)  s i xTDx  + cTx  + ykT(Ax  - b). 

Taking  the  limit  as  k € K approaches  <*>,  {yk}  — > y and 
— > x".  It  follows  that 

\ x"TDx"  + cTx"  + yT(Ax"  -b)  xTDx  ♦ cTx  + yT(Ax  - b). 


Therefore,  x"  € G(y)  which  implies  that  G(y)  is  not  a singleton,  a 
contradiction  to  Lemma  4.3. l.§ 

Theorem  4.3.2:  The  function  g(y)  is  differentiable. 

Proof:  Given  y^  y2  € Rm,  X]_  6 G(y:)  and  x2  € G(y2).  The  following  two 
inequalities  hold  from  the  definition  of  g(*): 


g(^)  - g(u2)  * f(x2)  + (Ax2  “ b)  " f(x2)  ‘ ^2(Ax2  ‘ b) 

= (Wx  - u2)T(Ax2  " b) 

9Cp2)  - g (Vi)  * f (x1)  + yJ(AXj.  - b)  - f(Xj)  - y[(Ax1  - b) 
= (w2  - h1)T(Ax1  - b). 
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These  inequalities  imply  that 

0 * gfwj)  - g(u2)  “ (u1  - u2)T(Ax2  - b) 

;>  (Vj_  - u2)T(AXl  - b)  - (yx  - y2)T(Ax2  - b) 

= (yl  ' y2)TA(xl  ‘ x2^  * "lyl  " y2l  iA(xl  " x2)  II 


g(Wi)  - g (u2)  - (Vi  - u7)T(Ax?  - b) 
o * * ~ II A ( x , - x p ) 

IK  - y2l 


By  Lemma  4.3.2,  x^  — > x2  as  y^  — > y2.  Therefore 

g(wi)  “ g(y2)  ‘ (Hi  * p?)T(Ax?  - b) 

lim  £ £ = o. 

yl->y2  lyi  ' y2l 


Hence,  g(y)  is  differentiable.  The  gradient  of  g(»)  at  y is 
<5g(y)  = Ax  - b.g 

* * 

Therefore,  y solves  (4.3.3)  if  and  only  if  Sg(y  ) = 0,  i.e., 

5g(y  ) = Ax  - b = 0.  (4.3.4) 

The  next  subsection  presents  a dual  based  iterative  method  for 
(4.3.3)  that  generates  a convergent  sequence  of  dual  variables  { yk } and 
a corresponding  sequence  of  primal  variables  {xk}  converging  to  a primal 
feasible  solution. 


4.3.1  Dual  Ascent  Method 

In  this  subsection,  we  utilize  the  properties  of  g(*)  to  develop  a 
general  scheme  to  solve  (4.3.3).  This  is  based  on  the  following 
procedure. 
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Dual  Ascent  Algorithm 

Step  0:  Let  y^  = 0 and  set  k = 0. 

Step  1:  Find  an  ascent  direction  dk  and  go  to  2. 
Step  2:  Let 

k k k 

a € arg  max  g(y  + ad  ) . 

Set  y = y + aKdK  and  go  to  3. 

Step  3:  If  ||6g(yk+^)  ||  s €,  terminate. 

Otherwise,  set  k = k + 1 and  go  to  1. 

4.3. 1.1  Computation  of  dk 


Since  g(«)  is  concave  and  differentiable,  the  direction  dk  can  be 

k 

computed  in  terms  of  5g(y  ),  which  includes  the  steepest  ascent,  quasi- 
Newton  and  CG  methods.  The  method  of  steepest  ascent  usually  works 
quite  well  during  early  iterations,  but  it  performs  poorly  as  the 
optimum  is  approached.  Conventional  quasi-Newton  methods  require  too 
much  storage  to  be  practical  for  large  networks.  The  only  alternatives 
requiring  only  first  derivatives  and  able  to  achieve  rapid  convergence 
are  CG  methods.  Among  the  CG  methods  for  nonquadratic  objectives  we 
have  the  following  updating  formulas: 
i)  Fletcher  and  Reeves  [1964]: 


= 5g(pM)  . 


*<+lxT-  , k + 1. 
5g(y  ) 6g (y  ) 

Sg(yk)T5g(yk) 
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ii)  Polak  and  Ribiere  [1969]: 


wk+l  - xr,r  k+1. 
d - 6g(y  ) + 


Xnf  k+l.T  k 
<5g  (y  ) y 

<5g(yk)TSg(yk) 


where  yk  = <5g(yk+1)  - 6g(yk). 


1/ 

4.3. 1.2  Computation  of  a 

We  now  propose  a specific  algorithm  for  solving  the  line  search 
subproblem  (Step  2)  which  can  be  stated  as 


Find  a € arg  max  g(y  + ctd  ) (4.3.5) 

where  g(yk  + adk)  = min  \ xTDx  + cTx  + (yk  + adk)T(Ax  - b) 

s.t.  1 S x S u.  (4.3.6) 


Since  g(*)  is  concave  and  differentiable,  a necessary  and 
sufficient  optimality  condition  for  a is 

6g(yk  + akdk)Tdk  = (Axk  - b)Tdk  = 0 (4.3.7) 

k v 

where  x is  the  solution  of  (4.3.6)  at  o . 

Thus,  imposing  the  condition  (4.3.7),  Problem  (4.3.5)  becomes 

min  i xTDx  + cTx  + ykT(Ax  - b) 
kT 

s.t.  dK  (Ax  - b)  = 0 (4.3.8) 

1 S x S u 

k 

and  a is  the  optimal  Lagrangian  multiplier  of  the  equality  constraint. 
This  problem  transformation  has  also  been  discussed  by  Linn  and  Pang 
[1985]  in  the  study  of  iterative  methods  for  convex  quadratic  programs. 
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(4.3.8)  can  be  rewritten  as  follows 


min  i i D..xt  + z c!.x..  + e 
(i,j)€E  1J  1J  (i,j)€E  1J  1J 

s.t.  E a . -x  • • = b 1 
(1.j)€E  '3  '3 


(4.3.9) 


1 S X s u 


where  c*. . = c . . + u . 

iJ 


- u. 


for  all  ( i , j ) € E 


e'  = z y,b 


i€V 


a..  » dk  - dk 
1J  1 J 


for  all  (i,j)  € E 


b'  = z djb.. 
i€V  1 1 


(4.3.9)  is  a slightly  more  general  problem  than  the  one  treated  by 
Helgason  et  al . [1980],  where  the  coefficients  of  the  equality 
constraint  are  one  and  the  lower  bounds  are  zero.  Below,  their  result 
is  extended  for  this  problem. 

The  central  idea  for  solving  (4.3.9)  is  to  consider  its  optimality 
conditions  and  find  the  Lagrangian  multiplier  for  its  equality 
constraint.  The  KKT  conditions  are 

D i j x i j + cij  + aija  + Vij  - Wij  = 0 for  all  (1,j)  € E (4.3.10) 

wij(1ij  - xij)  = 0 for  all  (i,j)  € E (4.3.11) 

vij(xij  - uij)  = 0 for  all  ( i , j ) 6 E (4.3.12) 

w,  v ^ 0 (4.3.13) 


and  x feasible. 
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The  following  solution  as  a function  of  a may  be  obtained 


wij(a) 

Vij<“) 


= max 


“cii  ' aiia 

min  { J J_,  u },  i 

D..  J J 

TJ 


max  [ D^.j  * c’.j  * aija»  0] 
max  [-DijUjj  - c;.j  - a1j0,  0] 


(4.3.14) 


Clearly,  the  above  solution  satisfies  the  bounding  constraints. 


i.e. , 


1 s x s u 
and  w,  v £ 0. 


The  following  three  lemmas  show  that  (4.3.10)  - (4.3.12)  are  also 
sati sf ied. 

Lemma  4,3.3:  (4.3.14)  satisfies  (4.3.10). 

Proof:  For  each  (i,j)  € E,  consider  the  following  three  cases: 

1)  If  (-c-j  - Sifl/Dij  4 then  Xij  = Ulj. 

' Cij  ' alf  2 and  “ij'ij  * cij  * aija  S 


c ’ . - a . .a, 
ij  ij  * 


■ CS:  - aij“ 2 0 '“P"65  that  vij  2 -°ijuij 
and  + c’.j  + a-jj  a * 0 implies  that  w-.  = 0. 

fhen,  D1jXlj  * cjj  * a1j0  * v,j  - 

" DijUij  * cij  * aij“  - Dijuij  ' cij  - aij“  - 0 * °- 
li)  If  lfj  < (-c*.j  - a^-aJ/D^.  < u1jf  then 

x1j  = ("cij  - aif  >/Dij-  -Dijuij  - cij  - aij“  2 
and  D^jl . j * c’.j  * a1ja  < 0. 

-D^u^.  - cjj  - a^.o  < 0 implies  that  v..  = 0,  and 

D.j  jl  j j + c’.j  + a—a  < 0 implies  that  w^.  = 0. 
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Then,  DijXiJ  * c?J  * ajjtI  * 

= D1j(-c5j  - a,j«)/01j  ♦ c5j  * ajf  *0-0-0. 
iii)  If  (-cjj  - aija)/Dij  * 1,-j,  then  xsj  - l1Jf 

ViJ  * cij  * aij“  2 °>  and  -°ijuij  - cij  ' aif  5 o- 

°ij1ij  + cij  + aij°  * 0 ^plies  that  Wij  = Dijlij  + c\.  * a..a,  and 
"D-jjUjj  - cij  - a1- ja  s 0 implies  that  v..  = 0. 

Then,  DijXlJ  * ctj  * a,.,*  * - w(J 

= Vu  * Cij  * aij“  * ° ' “ij’lj  ‘ cij  - aij“  - °-S 

Lemma  4.3.4:  (4.3.14)  satisfies  (4.3.11). 

Proof:  For  each  (i,j)  € E,  consider  the  following  two  cases: 

i)  If  D-jjl-jj  + cij  + a^.a  s 0,  then  w^.  = 0.  This  implies  that 
Wij^ij  " Xij^  = °* 

ii)  If  D i j 1 i j + cij  + a.ja  > 0,  then  1^  > (-cij  - a1ja)/D1j.  This 
implies  that  x..  = 1..  and,  thus,  wij(l..  - x^)  = 0.| 

Lemma  4.3.5:  (4.3.14)  satisfies  (4.3.12). 

Proof : For  each  (i,j)  6 E,  consider  the  following  two  cases: 

i)  If  -D.jjU.jj  - cij  - a^.a  s 0,  then  v^.  = 0.  This  implies  that 
vij(xij  - uijJ  = °- 

If  -DljUij  ‘ Cij  ‘ aif  > °-  then  <-cij  - aij“)/Dij  > uij- 
This  implies  that  and,  thus,  v^fx^.  - u^.)  = 0.JS 

Let  h(*)  be  defined  as 


h(°0  = E aiixii(a) 

(1,j)€E  1J  1J 


[~C  * • — d • -Ct 

min  { ^ ^ — , u s , ]. . 


(1.J)€E  1J 


D.  . 
TJ 


1J 
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!✓ 

Hence,  solving  (4.3.9)  is  equivalent  to  finding  o such  that 
h(ak)  = b'. 

The  theorem  below  is  necessary  for  the  convergence  of  the  algorithm 
for  ak. 

Theorem  4.3.3:  The  function  h(a)  is  piecewi se-1 inear  and  continuous 
nonincreasing. 

Proof:  Since  for  any  (i,j)  € E,  x-jj(#)  is  piecewi se-1  i near  and 
continuous,  h(»)  is  also  piecewise-1 inear  and  continuous. 

In  addition,  if  is  zero  or  positive  (negative),  then  x.. (•)  is 
nonincreasing  (nondecreasing).  Thus,  a^.x^.f*)  is  always  nonincreasing 
which  implies  that  h(*)  is  nonincreasing. f| 

For  any  (i,j)  € E,  x-jj(*)  has  at  most  two  breakpoints,  i.e., 

(-c’-j  - Di jUj j)/a. j and  (-cjj  - Dijlij)/a1j.  Therefore,  the  total 
number  of  different  breakpoints  of  h ( • ) , denoted  as  s,  is  less  or  equal 
than  2n.  Let  , ...,  a$  be  these  breakpoints  in  increasing  order.  The 
algorithm  below  is  based  on  the  bisection  method  to  obtain  an  interval 
defined  by  two  consecutive  breakpoints,  (a  , or+^),  containing  a that 
is  finally  generated  by  interpolation. 

1/ 

Algorithm  for  a 

Step  0:  Let  p = 1,  q = s,  U = h(a  ) = Z max  { a - - 1 . , , a-.u..}, 

p (i,j)€E 

and  L = h(o  ) = Z min  { a - - 1 - • , a-.u,.}. 

q (i,j)€E  J J J J 
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Step  1:  (Bracketing) 

If  q - p = 1,  go  to  3. 

Otherwise,  set  r = |^  (p  + q)J  , where  j^xj  is  the  largest 
integer  less  or  equal  than  x,  and  go  to  2. 

Step  2:  Compute  h(ar)  and  consider  the  following  three  cases: 

i)  If  h(ar)  = b ' , ar  is  a solution  and  terminate. 

ii)  If  h(ar)  < b1,  set  p = r,  L = h(ar)  and  go  to  1 

iii)  If  h(ar)  > b ' , set  q = r,  U = h(ar)  and  go  to  1. 

Step  3:  (Interpolation) 


a = 


«p)(b'  - L) 
(U  - L) 


a is  a solution  and  terminate. 


The  work  involved  in  the  initialization  of  the  algorithm  in  sorting 
the  breakpoints  of  h(«)  is  not  a hard  task  but  may  be  time  consuming 
since  it  has  to  be  done  at  each  iteration  of  the  dual  ascent  method. 
However,  as  the  method  progresses,  the  dual  vectors  converge,  so  that 
the  ordering  of  the  breakpoints  should  stabilize,  i.e.,  the  order  at  a 
certain  iteration  should  have  a good  chance  of  being  the  one  needed  for 
the  next.  Thus,  the  sorting  algorithm  employed  to  do  this  task  can  be 
enhanced  by  restarting  from  the  old  order. 

Since  a £ 0,  another  computational  enhancement  for  the  dual  ascent 
method  may  be  achieved  by  eliminating  the  breakpoints  that  are  negative 
after  modifying  the  values  of  p and  U in  Step  0 as  follows 

p = arg  max  {ai  : a.  £ 0,  i =1,  ...»  s} 

U = h(ap). 
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4.3.2  Examples 

Table  4-4  lists  the  data  of  two  small  examples  that  have  been  used 
to  test  the  dual  ascent  method.  The  first  example  has  4 nodes  and  5 
arcs,  and  the  second  12  nodes  and  22  arcs. 

Tables  4-5  and  4-6  report  the  solution  of  these  two  problems  using 
the  steepest  ascent  and  two  CG  methods  to  compute  dk.  The  CG  methods 
for  Example  1 restart  at  every  3 iterations,  and  for  Example  2 at  every 
18  iterations. 
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Table  4-4:  Data  for  Examples  1 and  2. 
a)  Data  for  Example  1: 


Arcs  Nodes 


(i»j) 

Dfj 

cij 

bj 

uij 

i 

bi 

(1,2) 

10. 

1. 

2. 

8. 

1 

6. 

(1,3) 

2. 

1. 

0. 

1. 

2 

0. 

(2,3) 

8. 

2. 

3. 

5. 

3 

0. 

(2,4) 

2. 

1. 

0. 

4. 

4 

-6. 

(3,4) 

2. 

1. 

0. 

6. 

b)  Data 

for  Example  2: 

Arcs 

Nodes 

(i,  j) 

D. . 

c . . 

1 . . 

u . . 

i 

| 

ij 

ij 

TJ 

(1,3) 

.8 

1. 

0. 

11. 

1 

(1,6) 

1.0 

4. 

2. 

8. 

2 

(2,3) 

.6 

3. 

0. 

5. 

3 

(2,4) 

.2 

7. 

8. 

9. 

4 

(3,4) 

.2 

5. 

0. 

5. 

5 

(3,5) 

.4 

2. 

9. 

11. 

6 

(3,6) 

.2 

1. 

0. 

5. 

7 

(4,6) 

.8 

9. 

3. 

7. 

8 

(4,7) 

.8 

7. 

0. 

2. 

9 

(5,7) 

1.0 

3. 

0. 

12. 

10 

(5,8) 

1.0 

2. 

5. 

10. 

11 

(6,8) 

.2 

1. 

0. 

5. 

12 

(6,10) 

.2 

4. 

2. 

12. 

(7,9) 

.4 

5. 

0. 

10. 

(7,12) 

.6 

3. 

0. 

6. 

(8,9) 

.8 

8. 

0. 

1. 

(8,10) 

.8 

2. 

0. 

10. 

(8,11) 

. 6 

4. 

2. 

6. 

(9,11) 

.6 

9. 

2. 

10. 

(10,9) 

.6 

7. 

1. 

5. 

(10,11) 

.2 

1. 

0. 

10. 

(10,12) 

.4 

13. 

4. 

15. 

ooooooooco 
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Table  4-5:  Solution  of  Example  1. 


Method 

Iter. 

g(v) 

l«9t»)| 

% RE 

0 

64.000 

7.8740 

68.0000 

1 

108.393 

6.1707 

45.8037 

2 

132.159 

6.1467 

33.9204 

3 

159.318 

3.2759 

20.3408 

Steepest 

4 

180.105 

2.8462 

9.9473 

Ascent 

5 

187.621 

2.2238 

6.1897 

10 

198.177 

1.0934 

0.9116 

20 

199.948 

0.1846 

0.0260 

30 

199.999 

0.0312 

0.0007 

33 

200.000 

0.0128 

0.0002 

Fletcher 

Reeves 

0 

1 

2 

3 

4 

5 
5 

64.000 

108.393 

174.992 

194.797 

198.228 

199.921 

200.000 

7.8740 

6.1707 

2.4746 

3.1537 

0.6624 

0.2906 

0.0000 

68.0000 

45.8037 

12.5041 

2.6015 

0.8858 

0.0397 

0.0000 

0 

64.000 

7.8740 

68.0000 

1 

108.393 

6.1707 

45.8037 

Pol  ak 

2 

174.992 

2.4746 

12.5041 

Ribiere 

3 

194.797 

3.1537 

2.6015 

4 

198.228 

0.6624 

0.8858 

5 

199.921 

0.2906 

0.0397 

6 

200.000 

0.0000 

0.0000 
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Table  4-6:  Solution  of  Example  2. 


Method 

Iter. 

g(w) 

P«g(p)|| 

% RE 

0 

259.000 

22.5389 

59.5086 

1 

475.218 

16.6512 

25.7055 

2 

506.759 

19.7332 

20.7745 

Steepest 

3 

543.390 

14.0195 

15.0477 

Ascent 

5 

565.842 

10.6158 

11.5376 

10 

584.883 

7.3529 

8.5608 

20 

610.801 

7.1419 

4.5088 

50 

636.879 

1.5225 

0.4319 

100 

639.403 

0.5282 

0.0373 

200 

639.641 

0.0291 

0.0001 

0 

259.000 

22.5389 

59.5086 

1 

475.218 

16.6512 

25.7055 

2 

511.898 

20.5915 

19.9711 

FI  etcher 

3 

536.549 

20.7944 

16.1172 

Reeves 

5 

569.577 

9.6002 

10.9537 

10 

612.532 

8.4865 

4.2382 

20 

631.022 

4.9819 

1.3475 

30 

639.322 

1.5036 

0.0499 

40 

639.640 

0.0736 

0.0002 

43 

639.641 

0.0319 

0.0000 

0 

259.000 

22.5389 

59.5086 

1 

475.218 

16.6512 

25.7055 

2 

511.898 

20.5915 

19.9711 

Pol  ak 

3 

536.232 

20.8755 

16.1668 

Ribiere 

5 

577.648 

9.8241 

9.6920 

10 

623.873 

5.9887 

2.4652 

20 

638.940 

1.8017 

0.1096 

30 

639.639 

0.1066 

0.0003 

32 

639.641 

0.0427 

0.0000 

CHAPTER  FIVE 


SUMMARY 

This  dissertation  has  presented  research  directed  towards  the 
solution  of  models  that  can  be  formulated  as  optimization  programs  with 
a pseudoconvex  objective  function  and  network  constraints.  In  practice, 
these  models  tend  to  be  large,  with  many  thousands  of  arcs  (variables) 
and  nodes  (equality  constraints).  They  are  at  least  an  order  of 
■ magnitude  larger  than  problems  that  can  be  handled  routinely  by  standard 
state-of-the-art  codes  for  linearly  constrained  nonlinear  optimization 
and,  therefore,  it  is  necessary  to  develop  algorithms  that  exploit  the 
special  structure  of  the  model  in  order  to  be  able  to  conserve  computer 
memory  requirements  and  to  obtain  an  accurate  solution  in  reasonable  CPU 
time. 

The  restricted  simplicial  algorithm  developed  in  Chapter  Three  is 
a general  form  of  decomposition  for  linearly  constrained  nonlinear 
programs  that  blends  first  and  second  order  methods.  The  linear 
subproblems  for  networks  are  solved  by  specialized  algorithms  that 
permit  the  practical  solution  of  extremely  large  problems.  It  is  not 
unusual  to  obtain  solution  times  for  the  subproblems  that  are  one 
hundredth  of  the  time  of  general  purpose  LP  codes.  The  nonlinear  master 
problem,  with  nonnegativity  constraints  only,  has  small  size  (at  most  r 
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variables)  and  can  be  solved  to  high  accuracy  with  two  or  three  minor 
iterations  of  the  projected  Newton  method  for  separable  objective 
functions  or  the  projected  quasi-Newton  for  nonseparable  objectives. 

The  efficiency  of  the  master  problem  relies  on  the  fact  that  the  initial 
solution  is  optimal  with  respect  to  the  simplex  defined  by  the  extreme 
points  generated  in  previous  iterations.  The  finite  convergence  proof 
is  a key  result  that  characterizes  the  class  of  models  that  are 
especially  suitable  for  this  procedure,  i.e.,  models  for  which  the 
optimal  solution  belongs  to  a face  of  the  feasible  region  with  small 
dimension. 

Separable  strictly  convex  quadratic  network  problems  are  an 
important  class  of  nonlinear  networks  for  which  the  dual  conjugate 
gradient  based  methods  of  Chapter  Four  are  highly  efficient.  These 
methods  take  full  advantage  of  the  sparsity  and  structure  of  the 
constraint  matrix.  It  has  been  shown  that  for  the  case  of  undirected 
arcs  the  dual  approach,  that  reduces  to  solving  a symmetric  system  of 
linear  equations,  outperforms  the  primal  method  of  Chapter  Three.  When 
the  flows  have  upper  and  lower  bounds,  the  dual  approach  also  looks 
promising  for  large  problems,  however,  a complete  computational  study 
along  with  the  analysis  of  preconditioning  techniques  to  enhance  the 
performance  of  the  conjugate  gradient  methods  employed,  has  not  been 
done  yet.  Another  important  question  that  needs  to  be  investigated  is 
the  extension  of  these  methods  to  separable  strictly  convex  functions  by 
successive  quadratic  approximation  at  each  iteration. 


APPENDIX  A 


RSD  FOR  AN  UNBOUNDED  FEASIBLE  REGION 


Assumption  3.2.2  states  that  the  feasible  region  of  Problem 
(3.2.1),  denoted  as  S,  is  bounded.  Although  this  assumption  is  not  too 
restrictive,  it  is  still  of  interest  to  extend  the  RSD  algorithm  to  the 
unbounded  case. 

If  S is  unbounded,  it  is  possible  that  a solution  may  not  exist. 
Thus,  in  order  to  guarantee  the  existence  of  a solution.  Assumption 
3.2.2  is  replaced  by  the  following: 

Assumption  A.l:  For  any  direction  d € Rn  such  that  x + ad  € S for  all 
a £ 0,  the  following  condition  holds 

lim  f(x  + ad)  = ” 
a— >°° 

In  addition,  since  the  linear  subproblem  may  generate  either  an 
extreme  point  or  an  extreme  direction,  the  feasible  region  of  the  master 
problem  will  be  defined  by  the  following  class  of  generalized  simplices 
[Rockafellar  1970]. 

Definition:  Let  {Zg,  z^,  ...,  Zp}  be  p+1  distinct  points  in  Rn  and 
{ulf  ...»  Up}  be  q distinct  directions  in  Rn  with  p+q  S n,  where  the 
vectors  z^  - Zg,  . . . , Zp  - Zg,  u^ , ...,  u^  are  linearly  independent. 
Then,  the  set 


no 
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p q p 

C = {x  : x = E a - z ■ + E 0 • u - , E a-  = 1,  a • > 0,  i = 0,  . . . , p, 
i=0  1 1 j=l  J J i=0  1 1 

8 . * 0,  j = 1,  . .. , q} 

is  called  a (p+q) -general ized  simplex  in  Rn.  In  addition,  since  C is 
always  contained  in  a manifold  of  dimension  p+q,  a (p+q) -general ized 
simplex  is  said  to  have  dimension  p+q,  written  as  dim  C = p+q. 

The  following  version  of  RSD  takes  into  account  the  extreme 
directions  in  the  feasible  region  and,  as  in  the  basic  algorithm,  it  is 
assumed  that  the  total  number  of  extreme  points  and  extreme  directions 
is  restricted  by  a parameter  r. 

Step  0:  Let  x®  be  a feasible  point  of  S.  Set  W^=0,  w|j=0,  W^={x^} 
k=0. 

Step  1:  (Subproblem) 

Sol  ve 

min  { 6f (x^ )y  : y 6 S}. 

If  the  subproblem  generates  an  extreme  point,  y , such  that 

k k k k 

6f(x  )(y  - x ) £ 0,  or  an  extreme  direction,  d , such  that 

k k k 

5f(x  )d  > 0,  x is  a solution  and  terminate. 

Otherwise, 

i)  If  |Wks|  + |Wk|  < r,  set  Wk+1=Wk  U {yk}  or 
Wk+1=Wk  U {dk/||dk||} , and  Wk+1=Wk. 

ii)  If  |Wk|  + |W^|  = r,  eliminate  the  element  of  Wk  or 
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with  the  minimal  weight  in  the  expression  of  x as  a 

linear  combination  of  W U W*.  Set  W*  =W*  U {yK}  or 

U (dk/||dk)},  and  Wk+*={xk} . 

Set  Wk+1=Hk+1U  Wk+k  and  go  to  2. 

Step  2:  (Master  problem) 

Let  xk+1  € arg  min  { f (x ) : x € H(Wk+1,  Wk+1)} 

|(+1  |<+1 

where  H(W  , ) is  the  generalized  simplex  defined  by 

Wk+1  and  Wk+1. 

k+1  k'+l  k+1 

Purge  1 and  Wx  1 of  all  the  elements  with  zero 

k+1 

weight  in  the  expression  of  x as  a linear  combination  of 
the  elements  of  Wk+*  U W^+^.  Set  k=k+l  and  go  to  1. 

The  global  and  finite  convergence  for  this  version  of  RSD  is 
similar  to  the  one  for  the  basic  algorithm  and  is  omitted. 


APPENDIX  B 


RSD  FOR  NONLINEAR  CONSTRAINTS 

The  RSD  algorithm  can  be  further  extended  to  the  case  where  some  of 
the  constraints  are  nonlinear.  Consider  the  problem 

min  f(x) 

s.t.  g j (x ) S 0 for  j = 1,  ... , q (B.l) 

A x S b 

i 

Define  the  following  sets: 

s!  = {x  : Oj (x ) s 0,  j = 1,  ...,  q} 

S2  = {x  : A x S b} 

S = sl  n s2 

and  consider  the  assumptions  below,  in  addition  to  Assumptions  3.2.1  and 
3.2.2. 

Assumption  B.l:  g.(»)  is  a continuously  differentiable  quasiconvex 

J 

function  for  j =1,  . . . , q. 

Assumption  B.2:  For  each  x € S,  the  set  { 6g - (x ) : g.(x)  = 0, 

J 3 

j = 1,  ....  q}  is  linearly  independent. 

Since  the  level  sets  of  quasiconvex  functions  are  convex  sets. 
Assumption  B.l  is  necessary  to  guarantee  that  the  feasible  regions  of 
the  master  problem  are  subsets  of  S.  The  second  assumption  avoids  the 
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case  of  having  nonlinear  equality  constraints  in  (B.l)  since  the 
possibility  of  replacing  g,(x)  = 0 by  g.(x)  s 0 and  -g.(x)  S 0 is  ruled 

J J J 

out. 

The  algorithm  described  below  is  basically  a generalization  of  the 
algorithm  of  Topkis  and  Veinott  [1967]  with  the  following  improvements: 

i)  The  line  search  step  has  been  replaced  by  a minimization  on  a 
simplex  defined  by  some  boundary  points  as  well  as  some  interior  points. 

ii)  The  direction  finding  subproblem  is  a scaled  version  of  the 
standard  subproblem  with  respect  to  the  set  of  nonlinear  constraints. 

iii)  x,  the  center  of  the  box  that  defines  the  normalization 

k k 

constraints  (i.e.,  1 s d s u ),  only  changes  when  the  progress  of  the 
algorithm  has  been  "satisfactory."  This  is  necessary  for  the 
finite  convergence  proof. 

Step  0:  Let  x^1  be  a feasible  point  of  S.  Set  x=x^,  l^=-ye,  u^=ye, 

W°=0,  W^={ x °}  and  k=0,  where  y>0  and  e=[l,  ...,  1]T. 

Step  1:  (Subproblem) 

k k 

If  ! 6 f ( x ) ||  = 0,  x is  a solution  and  terminate, 
k k 

Otherwise,  let  (z  , d ) be  an  optimal  solution  to  the 
following  subproblem: 
min  z 

s.t.  5f(xk)d  - || 6 f ( x k ) || z s 0 

<59j(xk)d  - ||  <5  g j (xk)  | z S -gj(xk)  for  j = 1,  ...,  q 
A d s b - A xk 
lk  s d s uk 
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k k 

If  z * 0,  x is  a solution  and  terminate. 

Otherwise,  let 

a = min  {1,  sup  {a  : gj(xk  + adk)  s 0,  j = 1,  . . . , q} 
yk  = xk  + adk . 

i)  ^ |Wk|  < r,  set  Wk+1=Wk  U {yk}  and  Wk+1=Wk. 

k k 

ii)  If  | Ws | = r,  replace  the  element  of  Ws  with  the 

minimal  weight  in  the  expression  of  x as  a convex 

k k 

combination  of  the  elements  of  W with  y to  obtain 

Wk+1  and  let  Wk+1={xk}. 

Set  Wk+1=Wk+1U  Wk+1  and  go  to  2. 

Step  2:  (Master  problem) 

Let  xk+1  € arg  min  {f (x)  : x 6 H(Wk+1)}. 
k+1  k+1 

Purge  Wg  and  Wx  of  all  elements  with  zero  weight  in  the 
k+1 

expression  of  x as  a convex  combination  of  the  elements 
of  Wk+1  and  go  to  3. 

Step  3:  Define  x,  lk+*  and  uk+^  as  follows: 


i)  If  | x - x 
..k+1 


k+1|L  * y/2,  set  lk+1=-ye 


(x  - xk+1)  and 


ye  + (x  - xk+1). 


ii)  If  || x - x 


k+1|l  > y/2,  set  x=xk+1,  lk+1=-ye  and 


k+1 

u =ye. 


Set  k=k+l  and  go  to  1, 


The  global  convergence  of  the  above  algorithm  follows  similar 
arguments  to  those  of  Topkis  and  Veinott  [1967]  and  Zangwill  [1969]  for 
feasible  direction  methods. 
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A finite  convergence  result  similar  to  Theorem  3.4.3  (Section  3.4) 

can  be  obtained  under  Assumption  3.4.1  and  the  following  two: 

* 

Assumption  B.3:  x , the  optimal  solution  of  (B.l),  belongs  to  the 
interior  of  the  set  Sj. 

Assumption  B.4:  p,  the  parameter  that  defines  the  size  of  the 
normalization  constraints,  is  small  relative  to  €,  the  radius  of  the 
largest  open  ball  B (x  , €)  around  x in  Sj. 

Under  Assumptions  B.3  and  B.4  it  can  be  shown  that  there  exists  a 
positive  integer  ir,  such  that  for  all  k^-rr  the  following  hold: 

i)  x does  not  change,  i.e.,  Step  3 ( i i ) is  never  executed. 

• • k k v 

ii)  If  (z  , d ) is  a solution  of  the  subproblem  in  Step  1,  then  y 

k k 

is  always  equal  to  x + d and  is  a solution  of  the  following  equivalent 
subproblem 

min  5f (x^)y 

s.t.  A y S b (B . 2) 

-pe  + x S y <,  pe  + x. 

Note  that  (B.2)  is  similar  to  the  standard  subproblem  of  RSD  for 
linear  constraints  and  its  feasible  region  does  not  change  with  k,  i.e., 

y is  always  an  extreme  point  of  a polytope  defined  by  {y  : A y s b, 

— — ★ 

-pe  + x S y s pe  + x}.  In  addition,  this  polytope  contains  x . Thus, 

★ 

since  x can  be  obtained  by  minimizing  f(x)  in  the  polytope,  a pattern 
similar  to  Section  3.4  can  be  used  for  the  finite  convergence  proof. 

A preliminary  computational  experimentation  of  the  above  algorithm 
with  large  quadratical ly  constrained  quadratic  programs  has  shown  that 
the  behavior  of  the  algorithm  is  similar  to  the  linearly  constrained 
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case,  i.e.,  the  progress  of  the  algorithm  improves  as  r increases.  This 
leads  to  the  conclusion  that  the  replacement  of  the  line  search  step  by 
a minimization  on  a simplex  can  improve  the  efficiency  of  some  feasible 
direction  methods,  and  make  them  useful  and  competitive  for  solving 
large  nonlinear  programming  problems. 


APPENDIX  C 


FURTHER  COMPUTATIONAL  NOTES 


Computer  Environment 

The  FORTRAN  programs  for  the  VAX  11/750  Computer  (Subsections  3.7.1 
and  4.3.2)  were  compiled  using  the  FORTRAN  4.0  Compiler  and  executed 
under  the  VMS  version  4.3  Operating  System. 

The  FORTRAN  programs  for  the  IBM  3081D  Computer  (Subsections  3.7.2, 
3.7.3,  3.7.4  and  4.2.2)  were  compiled  using  the  FORTRAN  H/Extended 
Compiler  and  executed  under  the  OS/MVS  Operating  System. 

Special  Computer  Science  Techniques 

For  all  of  the  network  problems  solved  in  this  work,  the  network 
data  was  stored  as  standard  linked  lists.  For  details  see  Kennington 
and  Helgason  [1980]. 

Storage  Requirements 

The  storage  requirements  for  the  RSD  algorithm  of  Chapter  3 may  be 
divided  between  requirements  for  the  master  problem  and  the  subproblem. 
Since  RSD  may  be  integrated  with  any  type  of  linear  programming  code, 
the  subproblem  requirements  would  depend  on  the  problem  type.  The 
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2 

principal  arrays  for  the  RSD  master  problem  require  (r  + 4)n  + r + 8r 
memory  locations. 

The  principal  arrays  for  the  algorithms  of  Chapter  4 require  4n  + 4m 
memory  locations  for  the  CG  algorithm,  4n  + 6m  for  the  PCG  algorithm, 
and  9n  + 8m  for  the  dual  ascent  method. 

Parameters  and  Tolerances 

A preliminary  experimentation  with  the  parameters  and  tolerances  of 

the  RSD  master  problem  led  to  the  following  choices: 

i)  Stopping  criteria:  RGAP^  S where  + min  {€^,  0 . 1*RG AP^} > 

RGAP^  is  the  relative  gap  for  the  master  problem,  RGAP^  is  the  relative 

0 -4 

gap  for  the  original  problem,  and  = 10  . 

-4 

i i )  Armijo  parameters:  6 = 0.5  and  o = 10 

iii)  Parameter  defining  the  set  of  near  binding  constraints  I+: 

e2  = io'2. 

Availability  of  the  codes 

A tape  containing  the  experimental  codes  in  EBCDIC  format  is  on 
file  with  the  ISE  departmental  copy  of  the  dissertation. 
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